Are there any data structures for .NET (the recent versions) that have an initial "small" mode that doesn't allocate memory? Something that would use initially memory on the stack (stackalloc), then if that is all used up switch to using the heap and only then put pressure on GC. A SmallList<T>
with a a similar interface to List<T>
would be the most useful when you know in most cases only a few items are added.
In C++ there is at least one good implementation of this idea used in LLVM, the SmallVector and some similar small types for sets/maps.
What is the difference between std::vector and llvm::SmallVector? which one to use when?
Also this video: CppCon 2016: Chandler Carruth “High Performance Code 201: Hybrid Data Structures"
I've been googling a bit and couldn't find anything about an existing implementation, and if it's even possible to do. It would have to be limited to value types more than certainly, but even then not sure how you would write some T value into the stackalloc array and read it back - for ex. if T is an int, is there now an API to "reinterpret_cast" a range of bytes from a span to an int directly, not by combining the 4 bytes one by one?
The closest data-struct with this idea I could find is the SStringBuilder from the Towel lib: https://github.com/ZacharyPatten/Towel/blob/main/Sources/Towel/SStringBuilder.cs
// SStringBuilder is a small helper for initializing strings.
// It will append to the span until the capacity is reached
// and then it will revert to a StringBuilder if necessary
Edit
Found this API, may be what's needed
https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.memorymarshal.read
Edit 2
Made a hacky SmallList that just works with the small number of values, throwing an exception if capacity is reached instead of switching to a List. Tried a few ways of doing this:
- with Span and using MemoryMarshal.Read/Write
- with Span and read/write directly the span
- with ArrayPool.Shared.Rent/Return
- with a List
- with a List with capacity preallocated in constructor
Up to close to 32 values, the Span versions are quite a lot faster, below are some results from BenchmarkDotNet - bench just adds N small structs (2 int fields) to the list and then gets each value and adds up the fields. N was 1,2,4... and a few more up to 128, was curious at what point the advantage disappears, in practice the stack part should be for < 10 values.
Results for 2 values:
| SpanListBench | 2 | 6.126 ns | 0.0188 ns | 0.0147 ns |
| TypedSpanlListBench | 2 | 5.403 ns | 0.0147 ns | 0.0123 ns |
| PoolListBench | 2 | 24.182 ns | 0.0375 ns | 0.0351 ns |
| ListBench | 2 | 18.160 ns | 0.1737 ns | 0.1540 ns |
| ListBenchPrealloc | 2 | 13.754 ns | 0.0978 ns | 0.0867 ns |
Results for 8 values:
| SpanListBench | 8 | 18.045 ns | 0.0494 ns | 0.0438 ns |
| TypedSpanListBench | 8 | 16.467 ns | 0.0565 ns | 0.0472 ns |
| PoolListBench | 8 | 31.558 ns | 0.0735 ns | 0.0651 ns |
| ListBench | 8 | 44.527 ns | 0.1977 ns | 0.1849 ns |
| ListBenchPrealloc | 8 | 29.989 ns | 0.2079 ns | 0.1736 ns |
ref struct TypedSpanList<T> where T:struct
{
private Span<T> buffer_;
private int count_;
public TypedSpanList(Span<T> buffer)
{
buffer_ = buffer;
count_ = 0;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void Add(T value)
{
if (count_ >= buffer_.Length)
{
throw new InvalidOperationException();
}
buffer_[count_] = value;
count_++;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public T Get(int index)
{
if (index >= buffer_.Length)
{
throw new InvalidOperationException();
}
return buffer_[index];
}
}
[Benchmark]
public int TypedListBench()
{
int count = 0;
Span<Test> s = stackalloc Test[Number];
var sl = new TypedSpanList<Test>(s);
for (int i = 0; i < Number; i++)
{
sl.Add(new Test() { a = i, b = i + 1 });
}
for (int i = 0; i < Number; i++)
{
var t = sl.Get(i);
count += t.a + t.b;
}
return count;
}