Introduced in .NET Core 2.1, Span<T> and ReadOnlySpan<T> have become widely adopted in .NET applications, both in user code and in the BCL. But what are those types used for exactly? Why is everybody talking about them? When and how should you use them—and what are they good for?
In this blog post, I’ll answer all of these questions and explore the performance benefits they bring to the table.
What is Span<T>?
Span<T> was introduced in .NET Core 2.1 with C# 7.3, and is treated as one of the fundamental additions to the .NET platform despite being pretty simple from the implementation standpoint. It is a type-safe representation of a contiguous region of memory. The region is arbitrary and there are multiple ways to specify it, e.g. using arrays, strings, pointers and more.
Span<T> offers a rich API that resembles that of arrays, making it intuitive to use for developers already familiar with array concepts, even though there are some important nuances to Span<T> that you should understand before using it.
This is what the internal state of Span<T> looks like:
readonly ref struct Span<T>
{
readonly ref T _reference;
readonly int _length;
}
The region is represented by a pointer to its first element and its length. Then, operations like element access, enumeration and search can be implemented by computing offsets from this pointer.
Using Span<T>
Span<T> is powerful because it can be used in many common scenarios with little to no overhead compared to arrays, lists, or interfaces like ICollection<T> or IList<T>. Spans can also be used for lower level constructs like pointers, inline arrays or buffers.
There are multiple reasons to prefer using Span<T> over these alternatives:
- It is a
ref struct: there is no GC overhead - It has an intuitive API
- It is recognized by the .NET compiler: typical operations like foreach work out of the box
- It is optimized by the .NET runtime: certain usage patterns may trigger runtime-level optimizations
- It enables uniform handling of different data sources: arrays, spans, pointers, etc.
There is also a read-only version of the type, ReadOnlySpan<T>. It has the same idea as Span<T> but has thinner API that does not allow writing to the underlying memory region. One of the primary use cases for ReadOnlySpan<T> is strings. There is an implicit conversion from string to ReadOnlySpan<char>, and we are going to look into this shortly.
Ultimately, spans are used extensively in .NET, and leveraging them may significantly improve performance and optimize memory allocations in hot paths.
We’re now going to take a look at how to use spans with common data types, along with benchmarks for common scenarios.
Span<T> vs T[]
One of the most common use cases for Span<T> is working with arrays. Arrays can be implicitly cast to Span<T> because they represent a contiguous region of memory.
Consider a method from a third-party library that we don’t control. The method XORs the elements of the array and returns the result:
public static int XorArray(int[] array)
{
int result = 0;
foreach (var x in array) result ^= x;
return result;
}
Our use case is that there is an array in our system, and we want to compute the sum of the XORs of its two halves:
int[] array = Enumerable.Range(0, 100).ToArray();
int xorSum = XorArray(array[..50]) + XorArray(array[50..]);
This code will work just fine, but it has a cost: a new instance of the array has to be created every time the range ([..]) operator is used. So we end up doing two allocations every time this operation is performed. Now, suppose this method is declared like this, using Span<int>:
public static int XorSpan(Span<int> span)
{
int result = 0;
foreach (var x in span) result ^= x;
return result;
}
Then, we could rewrite the code using AsSpan() and pass that to the method instead:
int[] array = Enumerable.Range(0, 100).ToArray();
int xorSum = XorSpan(array.AsSpan()[..50]) + XorSpan(array.AsSpan()[50..]);
This version of the code prevents the unnecessary allocations because it uses the range operator on a Span<int> instead of an array, while the result stays the same. Even better, if the library method is accepts a Span<int>, we can use if not only for arrays, but for all types that can be represented as a Span<int>.
Let’s compare the performance of these two approaches using BenchmarkDotNet:
[MemoryDiagnoser]
public class Benchmark
{
private readonly int[] _array = Enumerable.Range(0, 1_000_000).ToArray();
[Benchmark]
public int XorArray()
{
return XorArray(_array[..500_000]) + XorArray(_array[500_000..]);
}
[Benchmark]
public int XorSpan()
{
return XorSpan(_array.AsSpan()[..500_000]) + XorSpan(_array.AsSpan()[500_000..]);
}
private int XorArray(int[] array)
{
int result = 0;
foreach (var x in array)
{
result ^= x;
}
return result;
}
private int XorSpan(Span<int> span)
{
int result = 0;
foreach (var x in span)
{
result ^= x;
}
return result;
}
}
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|--------- |---------:|--------:|--------:|---------:|---------:|---------:|----------:|
| XorArray | 574.8 μs | 7.87 μs | 7.37 μs | 663.0859 | 663.0859 | 663.0859 | 4000255 B |
| XorSpan | 185.6 μs | 2.04 μs | 1.90 μs | - | - | - | - |
The version that uses spans is allocation-free, while the version that uses arrays allocates 4MB (1 million ints) just to do this simple calculation. Plus, the method that uses spans is ~68% faster.
Throughout this post we’ll see more benchmarks, and as it turns out, Span<T> is faster not only because of memory allocations.
ReadOnlySpan<T> vs string
As mentioned earlier, there is an implicit conversion from string to ReadOnlySpan<char> (not the other way around), which is possible due to how string is represented in .NET: it is essentially a read-only array of characters.
Most of the operations on string are available on ReadOnlySpan<char>, and sometimes span is even more useful, for example there is the EnumerateLines() method:
foreach (var lineSpan in _string.AsSpan().EnumerateLines())
{
// ..
}
Many common operations on strings allocate other strings. The notorious example is string.Split:
int words = 0;
foreach (string _ in _string.Split(' ')) words++;
return words;
In this code, a new string is allocated for every word. Plus, the resulting array from Split() is also allocated.
Now compare that with the span-based version:
int words = 0;
foreach (Range _ in _string.AsSpan().Split(' ')) words++;
return words;
Notice how the Split method uses Range values instead of strings to represent slices of the original span. This allows an allocation-free split, and the actual words can then be accessed by slicing the span:
ReadOnlySpan<char> span = _string.AsSpan();
foreach (Range range in span.Split(' '))
{
var word = span[range];
// use 'word' as a span
}
Under the hood, this version uses the SpanSplitEnumerator<T> struct instead of an array, which means there are no heap allocations at all.
Another common case where spans can help is with regular expressions. The Regex.Matches method returns a MatchCollection, which includes not just the matched strings but related metatada stored in Match objects, each of which allocates.
Let’s run some benchmarks to compare performance of common string operations using string vs ReadOnlySpan<char>:
[MemoryDiagnoser]
public class Benchmark
{
private readonly string _string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua!";
[Benchmark]
public int IndexOfSpan()
{
return _string.AsSpan().IndexOf("elit");
}
[Benchmark]
public int IndexOfString()
{
return _string.IndexOf("elit");
}
[Benchmark]
public ReadOnlySpan<char> RangeSpan()
{
var ipsum = _string.AsSpan()[20..42];
return ipsum;
}
[Benchmark]
public string RangeString()
{
var ipsum = _string[20..42];
return ipsum;
}
[Benchmark]
public int SplitSpan()
{
int words = 0;
foreach (var _ in _string.AsSpan().Split(' '))
{
words++;
}
return words;
}
[Benchmark]
public int SplitString()
{
int words = 0;
foreach (var _ in _string.Split(' '))
{
words++;
}
return words;
}
[Benchmark]
public int RegexMatchSpan()
{
int matches = 0;
foreach (var _ in Regex.EnumerateMatches(_string.AsSpan(), @"\b\w+(?=,)"))
{
matches++;
}
return matches;
}
[Benchmark]
public int RegexMatchString()
{
int matches = 0;
foreach (var _ in Regex.Matches(_string, @"\b\w+(?=,)"))
{
matches++;
}
return matches;
}
}
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
|----------------- |--------------:|-----------:|-----------:|-------:|-------:|----------:|
| IndexOfSpan | 4.1893 ns | 0.0277 ns | 0.0259 ns | - | - | - |
| IndexOfString | 64.4326 ns | 0.1520 ns | 0.1348 ns | - | - | - |
| RangeSpan | 0.3720 ns | 0.0043 ns | 0.0040 ns | - | - | - |
| RangeString | 4.1759 ns | 0.0529 ns | 0.0494 ns | 0.0038 | - | 72 B |
| SplitSpan | 94.3495 ns | 0.5499 ns | 0.5144 ns | - | - | - |
| SplitString | 110.9059 ns | 0.6182 ns | 0.5481 ns | 0.0463 | 0.0001 | 872 B |
| RegexMatchSpan | 1,908.3754 ns | 11.5733 ns | 10.8257 ns | - | - | - |
| RegexMatchString | 2,070.0914 ns | 7.5323 ns | 6.6772 ns | 0.0305 | - | 592 B |
The span-based versions consistently outperform their string counterparts – sometimes by a wide margin. Span versions perform zero allocations, while the string versions often incur both object and array allocations. Even methods like IndexOf, which are usually lightweight, show a noticeable speedup with spans.
Span<T> with stackalloc
Spans can be directly backed by stack-allocated memory using the stackalloc keyword:
Span<byte> bytes = stackalloc byte[1024];
This operation is significantly faster than any heap allocation and helps optimize hot paths where the collection is short-lived and scoped to a method.
NOTE: you should always keep in mind that using stackalloc may result in StackOverflowException if the allocated region is too large. Allowing an open-ended number of elements in the span while using stackalloc makes the system vulnerable to denial-of-service (DoS) attacks. Additionally, avoid using stackalloc inside loops, as repeated stack allocations can quickly exhaust stack memory. Consider extracting stackalloc outside the loop.
One way to guard against this is to use stackalloc for smaller bufffers and fall back to heap-based allocation (e.g. via ArrayPool<T>) when the size exceeds a safe threshold:
const int threshold = 4096;
Span<byte> span = sizeof(byte) * length <= threshold
? stackalloc byte[length]
: ArrayPool<byte>.Shared.Rent(length);
This approach gives you the best of both worlds: stack-based allocation when working with small spans, and heap allocation for larger sizes.
Span<T> and JIT
The .NET runtime team continuously improves the just-in-time (JIT) compiler, and one of its prominent features is dynamic profile-guided optimization (PGO). It allows the JIT to generate more efficient code over time as the application runs. Recent .NET versions have introduced many enhancements to dynamic PGO, including its handling of spans.
Dynamic PGO uses tiered compilation, which allows the JIT to substitute different assembly code for the same method during the lifetime of the application. When the app starts, the JIT emits lower quality assembly code but can recompile the methods with higher quality code over time as they become hot.
The actual assembly code that is emitted by the JIT depends on your OS and architecture. Examples in this blog post are produced on Windows 11 x64.
Suppose we have a method that checks whether an array contains a specific value. We want to keep it allocation-free, but at the same time have the ability to use it on a region of the array:
static bool ContainsArray(int[] array, int startIndex, int length)
{
for (int i = startIndex; i < startIndex + length; i++)
{
if (array[i] == 44)
{
return true;
}
}
return false;
}
Then, let’s run this method in the loop and see what the JIT emits for it:
int[] array = [42, 43, 44, 45, 46];
while (true)
{
ContainsArray(array, 1, 3);
}
; Instrumented Tier0 code
G_M000_IG03: ;; offset=0x0029
mov rax, gword ptr [rbp+0x10]
mov ecx, dword ptr [rbp-0x3C]
cmp ecx, dword ptr [rax+0x08]
jae SHORT G_M000_IG10
mov edx, ecx
lea rax, bword ptr [rax+4*rdx+0x10]
cmp dword ptr [rax], 44
jne SHORT G_M000_IG05
mov rcx, 0x7FFA29AB7970
call CORINFO_HELP_COUNTPROFILE32
mov eax, 1
G_M000_IG04: ;; offset=0x0055
add rsp, 112
pop rbp
ret
G_M000_IG05: ;; offset=0x005B
mov rcx, 0x7FFA29AB7974
call CORINFO_HELP_COUNTPROFILE32
mov eax, dword ptr [rbp-0x3C]
inc eax
mov dword ptr [rbp-0x3C], eax
G_M000_IG06: ;; offset=0x0072
mov eax, dword ptr [rbp-0x48]
dec eax
mov dword ptr [rbp-0x48], eax
cmp dword ptr [rbp-0x48], 0
jg SHORT G_M000_IG08
G_M000_IG07: ;; offset=0x0080
lea rcx, [rbp-0x48]
mov edx, 17
call CORINFO_HELP_PATCHPOINT
G_M000_IG08: ;; offset=0x008E
mov eax, dword ptr [rbp+0x18]
add eax, dword ptr [rbp+0x20]
cmp dword ptr [rbp-0x3C], eax
jl SHORT G_M000_IG03
mov rcx, 0x7FFA29AB7978
call CORINFO_HELP_COUNTPROFILE32
xor eax, eax
G_M000_IG09: ;; offset=0x00AA
add rsp, 112
pop rbp
ret
G_M000_IG10: ;; offset=0x00B0
call CORINFO_HELP_RNGCHKFAIL
int3
“Tier0” indicates that the method has just been compiled by the JIT for the first time. Even though the method logic is simple, the code is quite verbose. There are all sorts of mov between registers, profiling, and also there is a range check performed on every iteration because the JIT can’t prove that i always stays within bounds. Nevertheless, tiered compilation never really kicks in on this method so it stays in its initial Tier0 throughout the loop.
Let’s consider an alternative version that uses Span<T>:
static bool ContainsSpan(Span<int> span)
{
foreach (var x in span)
{
if (x == 44)
{
return true;
}
}
return false;
}
There is no need to pass the startIndex and length because the span can be sliced allocation-free on the caller side. Then, there is the same loop in Program.cs:
int[] array = [42, 43, 44, 45, 46];
while (true)
{
var span = array.AsSpan()[1..4];
ContainsSpan(span);
}
When this version of the method is JITted in Tier0, we still see verbose code:
; Instrumented Tier0 code
G_M000_IG05: ;; offset=0x003D
mov eax, dword ptr [rbp-0x40]
cmp dword ptr [rbp-0x4C], eax
jae G_M000_IG12
mov eax, dword ptr [rbp-0x4C]
mov eax, eax
mov rcx, bword ptr [rbp-0x48]
cmp dword ptr [rcx+4*rax], 44
jne SHORT G_M000_IG07
mov rcx, 0x7FFA29A8B408
call CORINFO_HELP_COUNTPROFILE32
mov eax, 1
G_M000_IG06: ;; offset=0x006C
add rsp, 128
pop rbp
ret
G_M000_IG07: ;; offset=0x0075
mov rcx, 0x7FFA29A8B40C
call CORINFO_HELP_COUNTPROFILE32
mov eax, dword ptr [rbp-0x4C]
inc eax
mov dword ptr [rbp-0x4C], eax
G_M000_IG08: ;; offset=0x008C
mov eax, dword ptr [rbp-0x58]
dec eax
mov dword ptr [rbp-0x58], eax
cmp dword ptr [rbp-0x58], 0
jg SHORT G_M000_IG10
G_M000_IG09: ;; offset=0x009A
lea rcx, [rbp-0x58]
mov edx, 25
call CORINFO_HELP_PATCHPOINT
G_M000_IG10: ;; offset=0x00A8
mov eax, dword ptr [rbp-0x4C]
cmp eax, dword ptr [rbp-0x40]
jl SHORT G_M000_IG05
mov rcx, 0x7FFA29A8B410
call CORINFO_HELP_COUNTPROFILE32
xor eax, eax
G_M000_IG11: ;; offset=0x00C1
add rsp, 128
pop rbp
ret
G_M000_IG12: ;; offset=0x00CA
call CORINFO_HELP_RNGCHKFAIL
int3
However, unline the array-based version, the JIT promotes the method to Tier1 as the loop is running:
; Tier1 code
G_M000_IG03: ;; offset=0x000C
cmp dword ptr [rax+rdx], 44
je SHORT G_M000_IG04
add rdx, 4
dec ecx
jne SHORT G_M000_IG03
jmp SHORT G_M000_IG06
G_M000_IG04: ;; offset=0x001C
mov eax, 1
G_M000_IG05: ;; offset=0x0021
ret
G_M000_IG06: ;; offset=0x0022
xor eax, eax
G_M000_IG07: ;; offset=0x0024
ret
Now we clearly see that there is significantly less code in Tier1, and JIT figured out that there is no need to do range checks as the foreach loop can’t be out of range for Span<T>.
Here is the benchmark we will be using to profile this scenario:
[MemoryDiagnoser]
public class Benchmark
{
private readonly int[] _array = Enumerable.Range(0, 1_000_000).ToArray();
[Benchmark]
public bool ContainsArray()
{
return ContainsArray(_array, 1, 999_998);
}
[Benchmark]
public bool ContainsSpan()
{
return ContainsSpan(_array.AsSpan()[1..999_999]);
}
[MethodImpl(MethodImplOptions.NoInlining)]
private bool ContainsArray(int[] array, int startIndex, int length)
{
for (int i = startIndex; i < startIndex + length; i++)
{
if (array[i] == 876876)
{
return true;
}
}
return false;
}
[MethodImpl(MethodImplOptions.NoInlining)]
private bool ContainsSpan(Span<int> span)
{
foreach (var x in span)
{
if (x == 876876)
{
return true;
}
}
return false;
}
}
| Method | Mean | Error | StdDev | Allocated |
|-------------- |---------:|--------:|--------:|----------:|
| ContainsArray | 242.6 us | 1.32 us | 1.23 us | - |
| ContainsSpan | 162.8 us | 1.66 us | 1.30 us | - |
Here, we see that the version with Span<T> is ~32.9% faster even though there are no allocations in both cases. This performance gain is largely thanks to tiered compilation, which is enabled by default starting from .NET Core 3.0. With spans, the JIT can apply more aggressive optimizations such as eliminating range checks.
Conclusion
Span<T> and ReadOnlySpan<T> are powerful tools in .NET that offer significant performance benefits, especially in high-throughput or memory-sensitive scenarios. By enabling allocation-free slicing, iteration, and transformation of data, spans allow developers to write more efficient code. The benchmarks clearly show consistent performance gains with zero allocations compared to traditional approaches.
These advantages come with some caveats, such as the need to manage stack size carefully when using stackalloc. But when used thoughtfully, spans can unlock substantial improvements in performance and memory usage.
Mastering Span is a valuable step toward writing high-performance, allocation-efficient .NET code. You can contact us at Trailhead if you’d like help making sure your .NET applications are as efficient as possible. We’ll be happy to help!


