.NET Performance: Using Span<T>

Introduced in .NET Core 2.1, Span<T> and ReadOnlySpan<T> have become widely adopted in .NET applications, both in user code and in the BCL. But what are those types used for exactly? Why is everybody talking about them? When and how should you use them—and what are they good for?

In this blog post, I’ll answer all of these questions and explore the performance benefits they bring to the table.

What is Span<T>?

Span<T> was introduced in .NET Core 2.1 with C# 7.3, and is treated as one of the fundamental additions to the .NET platform despite being pretty simple from the implementation standpoint. It is a type-safe representation of a contiguous region of memory. The region is arbitrary and there are multiple ways to specify it, e.g. using arrays, strings, pointers and more.

Span<T> offers a rich API that resembles that of arrays, making it intuitive to use for developers already familiar with array concepts, even though there are some important nuances to Span<T> that you should understand before using it.

This is what the internal state of Span<T> looks like:

readonly ref struct Span<T>
{
    readonly ref T _reference;

    readonly int _length;
}

The region is represented by a pointer to its first element and its length. Then, operations like element access, enumeration and search can be implemented by computing offsets from this pointer.

Using Span<T>

Span<T> is powerful because it can be used in many common scenarios with little to no overhead compared to arrays, lists, or interfaces like ICollection<T> or IList<T>. Spans can also be used for lower level constructs like pointers, inline arrays or buffers.

There are multiple reasons to prefer using Span<T> over these alternatives:

  • It is a ref struct: there is no GC overhead
  • It has an intuitive API
  • It is recognized by the .NET compiler: typical operations like foreach work out of the box
  • It is optimized by the .NET runtime: certain usage patterns may trigger runtime-level optimizations
  • It enables uniform handling of different data sources: arrays, spans, pointers, etc.

There is also a read-only version of the type, ReadOnlySpan<T>. It has the same idea as Span<T> but has thinner API that does not allow writing to the underlying memory region. One of the primary use cases for ReadOnlySpan<T> is strings. There is an implicit conversion from string to ReadOnlySpan<char>, and we are going to look into this shortly.

Ultimately, spans are used extensively in .NET, and leveraging them may significantly improve performance and optimize memory allocations in hot paths.

We’re now going to take a look at how to use spans with common data types, along with benchmarks for common scenarios.

Span<T> vs T[]

One of the most common use cases for Span<T> is working with arrays. Arrays can be implicitly cast to Span<T> because they represent a contiguous region of memory.

Consider a method from a third-party library that we don’t control. The method XORs the elements of the array and returns the result:

public static int XorArray(int[] array)
{
    int result = 0;
    foreach (var x in array) result ^= x;
    return result;
}

Our use case is that there is an array in our system, and we want to compute the sum of the XORs of its two halves:

int[] array = Enumerable.Range(0, 100).ToArray();
int xorSum = XorArray(array[..50]) + XorArray(array[50..]);

This code will work just fine, but it has a cost: a new instance of the array has to be created every time the range ([..]) operator is used. So we end up doing two allocations every time this operation is performed. Now, suppose this method is declared like this, using Span<int>:

public static int XorSpan(Span<int> span)
{
    int result = 0;
    foreach (var x in span) result ^= x;
    return result;
}

Then, we could rewrite the code using AsSpan() and pass that to the method instead:

int[] array = Enumerable.Range(0, 100).ToArray();
int xorSum = XorSpan(array.AsSpan()[..50]) + XorSpan(array.AsSpan()[50..]);

This version of the code prevents the unnecessary allocations because it uses the range operator on a Span<int> instead of an array, while the result stays the same. Even better, if the library method is accepts a Span<int>, we can use if not only for arrays, but for all types that can be represented as a Span<int>.

Let’s compare the performance of these two approaches using BenchmarkDotNet:

[MemoryDiagnoser]
public class Benchmark
{
    private readonly int[] _array = Enumerable.Range(0, 1_000_000).ToArray();

    [Benchmark]
    public int XorArray()
    {
        return XorArray(_array[..500_000]) + XorArray(_array[500_000..]);
    }
    
    [Benchmark]
    public int XorSpan()
    {
        return XorSpan(_array.AsSpan()[..500_000]) + XorSpan(_array.AsSpan()[500_000..]);
    }
    
    private int XorArray(int[] array)
    {
        int result = 0;

        foreach (var x in array)
        {
            result ^= x;
        }
    
        return result;
    }
    
    private int XorSpan(Span<int> span)
    {
        int result = 0;

        foreach (var x in span)
        {
            result ^= x;
        }
    
        return result;
    }
}
| Method   | Mean     | Error   | StdDev  | Gen0     | Gen1     | Gen2     | Allocated |
|--------- |---------:|--------:|--------:|---------:|---------:|---------:|----------:|
| XorArray | 574.8 μs | 7.87 μs | 7.37 μs | 663.0859 | 663.0859 | 663.0859 | 4000255 B |
| XorSpan  | 185.6 μs | 2.04 μs | 1.90 μs |        - |        - |        - |         - |

The version that uses spans is allocation-free, while the version that uses arrays allocates 4MB (1 million ints) just to do this simple calculation. Plus, the method that uses spans is ~68% faster.

Throughout this post we’ll see more benchmarks, and as it turns out, Span<T> is faster not only because of memory allocations.

ReadOnlySpan<T> vs string

As mentioned earlier, there is an implicit conversion from string to ReadOnlySpan<char> (not the other way around), which is possible due to how string is represented in .NET: it is essentially a read-only array of characters.

Most of the operations on string are available on ReadOnlySpan<char>, and sometimes span is even more useful, for example there is the EnumerateLines() method:

foreach (var lineSpan in _string.AsSpan().EnumerateLines())
{
    // ..
}

Many common operations on strings allocate other strings. The notorious example is string.Split:

int words = 0;
foreach (string _ in _string.Split(' ')) words++;
return words;

In this code, a new string is allocated for every word. Plus, the resulting array from Split() is also allocated.

Now compare that with the span-based version:

int words = 0;
foreach (Range _ in _string.AsSpan().Split(' ')) words++;
return words;

Notice how the Split method uses Range values instead of strings to represent slices of the original span. This allows an allocation-free split, and the actual words can then be accessed by slicing the span:

ReadOnlySpan<char> span = _string.AsSpan();
        
foreach (Range range in span.Split(' '))
{
    var word = span[range];
    // use 'word' as a span
}

Under the hood, this version uses the SpanSplitEnumerator<T> struct instead of an array, which means there are no heap allocations at all.

Another common case where spans can help is with regular expressions. The Regex.Matches method returns a MatchCollection, which includes not just the matched strings but related metatada stored in Match objects, each of which allocates.

Let’s run some benchmarks to compare performance of common string operations using string vs ReadOnlySpan<char>:

[MemoryDiagnoser]
public class Benchmark
{
    private readonly string _string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua!";
    
    [Benchmark]
    public int IndexOfSpan()
    {
        return _string.AsSpan().IndexOf("elit");
    }
    
    [Benchmark]
    public int IndexOfString()
    {
        return _string.IndexOf("elit");
    }
    
    [Benchmark]
    public ReadOnlySpan<char> RangeSpan()
    {
        var ipsum = _string.AsSpan()[20..42];
        return ipsum;
    }
    
    [Benchmark]
    public string RangeString()
    {
        var ipsum = _string[20..42];
        return ipsum;
    }

    [Benchmark]
    public int SplitSpan()
    {
        int words = 0;

        foreach (var _ in _string.AsSpan().Split(' '))
        {
            words++;
        }
        
        return words;
    }

    [Benchmark]
    public int SplitString()
    {
        int words = 0;

        foreach (var _ in _string.Split(' '))
        {
            words++;
        }
        
        return words;
    }
    
    [Benchmark]
    public int RegexMatchSpan()
    {
        int matches = 0;

        foreach (var _ in Regex.EnumerateMatches(_string.AsSpan(), @"\b\w+(?=,)"))
        {
            matches++;
        }
        
        return matches;
    }
    
    [Benchmark]
    public int RegexMatchString()
    {
        int matches = 0;
        
        foreach (var _ in Regex.Matches(_string, @"\b\w+(?=,)"))
        {
            matches++;
        }
        
        return matches;
    }
}
| Method           | Mean          | Error      | StdDev     | Gen0   | Gen1   | Allocated |
|----------------- |--------------:|-----------:|-----------:|-------:|-------:|----------:|
| IndexOfSpan      |     4.1893 ns |  0.0277 ns |  0.0259 ns |      - |      - |         - |
| IndexOfString    |    64.4326 ns |  0.1520 ns |  0.1348 ns |      - |      - |         - |
| RangeSpan        |     0.3720 ns |  0.0043 ns |  0.0040 ns |      - |      - |         - |
| RangeString      |     4.1759 ns |  0.0529 ns |  0.0494 ns | 0.0038 |      - |      72 B |
| SplitSpan        |    94.3495 ns |  0.5499 ns |  0.5144 ns |      - |      - |         - |
| SplitString      |   110.9059 ns |  0.6182 ns |  0.5481 ns | 0.0463 | 0.0001 |     872 B |
| RegexMatchSpan   | 1,908.3754 ns | 11.5733 ns | 10.8257 ns |      - |      - |         - |
| RegexMatchString | 2,070.0914 ns |  7.5323 ns |  6.6772 ns | 0.0305 |      - |     592 B |

The span-based versions consistently outperform their string counterparts – sometimes by a wide margin. Span versions perform zero allocations, while the string versions often incur both object and array allocations. Even methods like IndexOf, which are usually lightweight, show a noticeable speedup with spans.

Span<T> with stackalloc

Spans can be directly backed by stack-allocated memory using the stackalloc keyword:

Span<byte> bytes = stackalloc byte[1024];

This operation is significantly faster than any heap allocation and helps optimize hot paths where the collection is short-lived and scoped to a method.

NOTE: you should always keep in mind that using stackalloc may result in StackOverflowException if the allocated region is too large. Allowing an open-ended number of elements in the span while using stackalloc makes the system vulnerable to denial-of-service (DoS) attacks. Additionally, avoid using stackalloc inside loops, as repeated stack allocations can quickly exhaust stack memory. Consider extracting stackalloc outside the loop.

One way to guard against this is to use stackalloc for smaller bufffers and fall back to heap-based allocation (e.g. via ArrayPool<T>) when the size exceeds a safe threshold:

const int threshold = 4096;

Span<byte> span = sizeof(byte) * length <= threshold
    ? stackalloc byte[length]
    : ArrayPool<byte>.Shared.Rent(length);

This approach gives you the best of both worlds: stack-based allocation when working with small spans, and heap allocation for larger sizes.

Span<T> and JIT

The .NET runtime team continuously improves the just-in-time (JIT) compiler, and one of its prominent features is dynamic profile-guided optimization (PGO). It allows the JIT to generate more efficient code over time as the application runs. Recent .NET versions have introduced many enhancements to dynamic PGO, including its handling of spans.

Dynamic PGO uses tiered compilation, which allows the JIT to substitute different assembly code for the same method during the lifetime of the application. When the app starts, the JIT emits lower quality assembly code but can recompile the methods with higher quality code over time as they become hot.

The actual assembly code that is emitted by the JIT depends on your OS and architecture. Examples in this blog post are produced on Windows 11 x64.

Suppose we have a method that checks whether an array contains a specific value. We want to keep it allocation-free, but at the same time have the ability to use it on a region of the array:

static bool ContainsArray(int[] array, int startIndex, int length)
{
    for (int i = startIndex; i < startIndex + length; i++)
    {
        if (array[i] == 44)
        {
            return true;
        }
    }
    
    return false;
}

Then, let’s run this method in the loop and see what the JIT emits for it:

int[] array = [42, 43, 44, 45, 46];

while (true)
{
    ContainsArray(array, 1, 3);
}
; Instrumented Tier0 code

G_M000_IG03:                ;; offset=0x0029
       mov      rax, gword ptr [rbp+0x10]
       mov      ecx, dword ptr [rbp-0x3C]
       cmp      ecx, dword ptr [rax+0x08]
       jae      SHORT G_M000_IG10
       mov      edx, ecx
       lea      rax, bword ptr [rax+4*rdx+0x10]
       cmp      dword ptr [rax], 44
       jne      SHORT G_M000_IG05
       mov      rcx, 0x7FFA29AB7970
       call     CORINFO_HELP_COUNTPROFILE32
       mov      eax, 1

G_M000_IG04:                ;; offset=0x0055
       add      rsp, 112
       pop      rbp
       ret

G_M000_IG05:                ;; offset=0x005B
       mov      rcx, 0x7FFA29AB7974
       call     CORINFO_HELP_COUNTPROFILE32
       mov      eax, dword ptr [rbp-0x3C]
       inc      eax
       mov      dword ptr [rbp-0x3C], eax

G_M000_IG06:                ;; offset=0x0072
       mov      eax, dword ptr [rbp-0x48]
       dec      eax
       mov      dword ptr [rbp-0x48], eax
       cmp      dword ptr [rbp-0x48], 0
       jg       SHORT G_M000_IG08

G_M000_IG07:                ;; offset=0x0080
       lea      rcx, [rbp-0x48]
       mov      edx, 17
       call     CORINFO_HELP_PATCHPOINT

G_M000_IG08:                ;; offset=0x008E
       mov      eax, dword ptr [rbp+0x18]
       add      eax, dword ptr [rbp+0x20]
       cmp      dword ptr [rbp-0x3C], eax
       jl       SHORT G_M000_IG03
       mov      rcx, 0x7FFA29AB7978
       call     CORINFO_HELP_COUNTPROFILE32
       xor      eax, eax

G_M000_IG09:                ;; offset=0x00AA
       add      rsp, 112
       pop      rbp
       ret

G_M000_IG10:                ;; offset=0x00B0
       call     CORINFO_HELP_RNGCHKFAIL
       int3

Tier0” indicates that the method has just been compiled by the JIT for the first time. Even though the method logic is simple, the code is quite verbose. There are all sorts of mov between registers, profiling, and also there is a range check performed on every iteration because the JIT can’t prove that i always stays within bounds. Nevertheless, tiered compilation never really kicks in on this method so it stays in its initial Tier0 throughout the loop.

Let’s consider an alternative version that uses Span<T>:

static bool ContainsSpan(Span<int> span)
{
    foreach (var x in span)
    {
        if (x == 44)
        {
            return true;
        }
    }
    
    return false;
}

There is no need to pass the startIndex and length because the span can be sliced allocation-free on the caller side. Then, there is the same loop in Program.cs:

int[] array = [42, 43, 44, 45, 46];

while (true)
{
    var span = array.AsSpan()[1..4];
    ContainsSpan(span);
}

When this version of the method is JITted in Tier0, we still see verbose code:

; Instrumented Tier0 code

G_M000_IG05:                ;; offset=0x003D
       mov      eax, dword ptr [rbp-0x40]
       cmp      dword ptr [rbp-0x4C], eax
       jae      G_M000_IG12
       mov      eax, dword ptr [rbp-0x4C]
       mov      eax, eax
       mov      rcx, bword ptr [rbp-0x48]
       cmp      dword ptr [rcx+4*rax], 44
       jne      SHORT G_M000_IG07
       mov      rcx, 0x7FFA29A8B408
       call     CORINFO_HELP_COUNTPROFILE32
       mov      eax, 1

G_M000_IG06:                ;; offset=0x006C
       add      rsp, 128
       pop      rbp
       ret

G_M000_IG07:                ;; offset=0x0075
       mov      rcx, 0x7FFA29A8B40C
       call     CORINFO_HELP_COUNTPROFILE32
       mov      eax, dword ptr [rbp-0x4C]
       inc      eax
       mov      dword ptr [rbp-0x4C], eax

G_M000_IG08:                ;; offset=0x008C
       mov      eax, dword ptr [rbp-0x58]
       dec      eax
       mov      dword ptr [rbp-0x58], eax
       cmp      dword ptr [rbp-0x58], 0
       jg       SHORT G_M000_IG10

G_M000_IG09:                ;; offset=0x009A
       lea      rcx, [rbp-0x58]
       mov      edx, 25
       call     CORINFO_HELP_PATCHPOINT

G_M000_IG10:                ;; offset=0x00A8
       mov      eax, dword ptr [rbp-0x4C]
       cmp      eax, dword ptr [rbp-0x40]
       jl       SHORT G_M000_IG05
       mov      rcx, 0x7FFA29A8B410
       call     CORINFO_HELP_COUNTPROFILE32
       xor      eax, eax

G_M000_IG11:                ;; offset=0x00C1
       add      rsp, 128
       pop      rbp
       ret

G_M000_IG12:                ;; offset=0x00CA
       call     CORINFO_HELP_RNGCHKFAIL
       int3

However, unline the array-based version, the JIT promotes the method to Tier1 as the loop is running:

; Tier1 code

G_M000_IG03:                ;; offset=0x000C
       cmp      dword ptr [rax+rdx], 44
       je       SHORT G_M000_IG04
       add      rdx, 4
       dec      ecx
       jne      SHORT G_M000_IG03
       jmp      SHORT G_M000_IG06

G_M000_IG04:                ;; offset=0x001C
       mov      eax, 1

G_M000_IG05:                ;; offset=0x0021
       ret

G_M000_IG06:                ;; offset=0x0022
       xor      eax, eax

G_M000_IG07:                ;; offset=0x0024
       ret

Now we clearly see that there is significantly less code in Tier1, and JIT figured out that there is no need to do range checks as the foreach loop can’t be out of range for Span<T>.

Here is the benchmark we will be using to profile this scenario:

[MemoryDiagnoser]
public class Benchmark
{
    private readonly int[] _array = Enumerable.Range(0, 1_000_000).ToArray();

    [Benchmark]
    public bool ContainsArray()
    {
        return ContainsArray(_array, 1, 999_998);
    }
    
    [Benchmark]
    public bool ContainsSpan()
    {
        return ContainsSpan(_array.AsSpan()[1..999_999]);
    }
    
    [MethodImpl(MethodImplOptions.NoInlining)]
    private bool ContainsArray(int[] array, int startIndex, int length)
    {
        for (int i = startIndex; i < startIndex + length; i++)
        {
            if (array[i] == 876876)
            {
                return true;
            }
        }
    
        return false;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private bool ContainsSpan(Span<int> span)
    {
        foreach (var x in span)
        {
            if (x == 876876)
            {
                return true;
            }
        }
    
        return false;
    }
}
| Method        | Mean     | Error   | StdDev  | Allocated |
|-------------- |---------:|--------:|--------:|----------:|
| ContainsArray | 242.6 us | 1.32 us | 1.23 us |         - |
| ContainsSpan  | 162.8 us | 1.66 us | 1.30 us |         - |

Here, we see that the version with Span<T> is ~32.9% faster even though there are no allocations in both cases. This performance gain is largely thanks to tiered compilation, which is enabled by default starting from .NET Core 3.0. With spans, the JIT can apply more aggressive optimizations such as eliminating range checks.

Conclusion

Span<T> and ReadOnlySpan<T> are powerful tools in .NET that offer significant performance benefits, especially in high-throughput or memory-sensitive scenarios. By enabling allocation-free slicing, iteration, and transformation of data, spans allow developers to write more efficient code. The benchmarks clearly show consistent performance gains with zero allocations compared to traditional approaches.

These advantages come with some caveats, such as the need to manage stack size carefully when using stackalloc. But when used thoughtfully, spans can unlock substantial improvements in performance and memory usage.

Mastering Span is a valuable step toward writing high-performance, allocation-efficient .NET code. You can contact us at Trailhead if you’d like help making sure your .NET applications are as efficient as possible. We’ll be happy to help!

Picture of Nick Kovalenko

Nick Kovalenko

Nick is an experienced .NET software engineer with a B.S. in Software Engineering. He specializes in creating maintainable and performant code, having designed, developed, and maintained multiple complex systems throughout the entire development lifecycle. While his primary expertise lies in .NET, he is also proficient in JavaScript. Outside of his professional work, Nick enjoys hiking and playing the piano.

Free Consultation

Sign up for a FREE consultation with one of Trailhead's experts.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.

Related Blog Posts

We hope you’ve found this to be helpful and are walking away with some new, useful insights. If you want to learn more, here are a couple of related articles that others also usually find to be interesting:

Our Gear Is Packed and We're Excited to Explore With You

Ready to come with us? 

Together, we can map your company’s software journey and start down the right trails. If you’re set to take the first step, simply fill out our contact form. We’ll be in touch quickly – and you’ll have a partner who is ready to help your company take the next step on its software journey. 

We can’t wait to hear from you! 

Main Contact

This field is for validation purposes and should be left unchanged.

Together, we can map your company’s tech journey and start down the trails. If you’re set to take the first step, simply fill out the form below. We’ll be in touch – and you’ll have a partner who cares about you and your company. 

We can’t wait to hear from you! 

Montage Portal

Montage Furniture Services provides furniture protection plans and claims processing services to a wide selection of furniture retailers and consumers.

Project Background

Montage was looking to build a new web portal for both Retailers and Consumers, which would integrate with Dynamics CRM and other legacy systems. The portal needed to be multi tenant and support branding and configuration for different Retailers. Trailhead architected the new Montage Platform, including the Portal and all of it’s back end integrations, did the UI/UX and then delivered the new system, along with enhancements to DevOps and processes.

Logistics

We’ve logged countless miles exploring the tech world. In doing so, we gained the experience that enables us to deliver your unique software and systems architecture needs. Our team of seasoned tech vets can provide you with:

Custom App and Software Development

We collaborate with you throughout the entire process because your customized tech should fit your needs, not just those of other clients.

Cloud and Mobile Applications

The modern world demands versatile technology, and this is exactly what your mobile and cloud-based apps will give you.

User Experience and Interface (UX/UI) Design

We want your end users to have optimal experiences with tech that is highly intuitive and responsive.

DevOps

This combination of Agile software development and IT operations provides you with high-quality software at reduced cost, time, and risk.

Trailhead stepped into a challenging project – building our new web architecture and redeveloping our portals at the same time the business was migrating from a legacy system to our new CRM solution. They were able to not only significantly improve our web development architecture but our development and deployment processes as well as the functionality and performance of our portals. The feedback from customers has been overwhelmingly positive. Trailhead has proven themselves to be a valuable partner.

– BOB DOERKSEN, Vice President of Technology Services
at Montage Furniture Services

Technologies Used

When you hit the trails, it is essential to bring appropriate gear. The same holds true for your digital technology needs. That’s why Trailhead builds custom solutions on trusted platforms like .NET, Angular, React, and Xamarin.

Expertise

We partner with businesses who need intuitive custom software, responsive mobile applications, and advanced cloud technologies. And our extensive experience in the tech field allows us to help you map out the right path for all your digital technology needs.

  • Project Management
  • Architecture
  • Web App Development
  • Cloud Development
  • DevOps
  • Process Improvements
  • Legacy System Integration
  • UI Design
  • Manual QA
  • Back end/API/Database development

We partner with businesses who need intuitive custom software, responsive mobile applications, and advanced cloud technologies. And our extensive experience in the tech field allows us to help you map out the right path for all your digital technology needs.

Our Gear Is Packed and We're Excited to Explore with You

Ready to come with us? 

Together, we can map your company’s tech journey and start down the trails. If you’re set to take the first step, simply fill out the contact form. We’ll be in touch – and you’ll have a partner who cares about you and your company. 

We can’t wait to hear from you! 

Thank you for reaching out.

You’ll be getting an email from our team shortly. If you need immediate assistance, please call (616) 371-1037.