7

I'm using C# (.NET 5). Imagine I have a class that stores an array of structures (say, floats):

public class StoresArray
{
    private float[] floats;
}

This class's data is loaded from a serialized binary file. To assign the floats array, I use a helper function to read bytes from the serialized file. Importantly, this function then attempts to reinterpret the loaded bytes directly as float[] rather than copying to a new array.

public static class Deserializer
{
    public static float[] Load(string file)
    {
        byte[] bytes = LoadBytesFromFile(file);

        // This is a compiler error, of course.
        return (float[])bytes;
    }
}

The intended usage is as follows:

// Within the StoresArray class...
floats = Deserializer.Load("MyFile.file");

Of note here is that I'm attempting to store the float[] as a member variable, not just iterate over the byte[] locally. As such, casting via Span<T> (Span<float> floatSpan = MemoryMarshal.Cast<byte, float>(bytes.AsSpan())) is insufficient. Functions associated with Memory<T>, Marshal, and MemoryMarshal have similarly failed as well. Of course I could use spans (along with other methods, like BitConverter or unsafe pointers) to build a new float[] from the byte[], but that would incur an additional array allocation, as well as additional operations to convert the bytes. In the context in which I'm asking (loading video game assets on the fly), I'd like to optimize performance as much as I can.

In modern C#, is it possible to reinterpret and store arrays of structs without incurring an additional allocation?

Grimelios
  • 351
  • 2
  • 11
  • it depends on the format you are serializing. – Daniel A. White Dec 10 '21 at 19:30
  • What functions did you try from the `Marshal` class, and how did it "fail"? – dan04 Dec 10 '21 at 19:32
  • 1
    @dan04 The most notable `Marshal` function in this context (imo) is `PtrToStructure`, which does successfully let me create _one_ structure (`T item = Marshal.PtrToStructure(new IntPtr(address)`). Unfortunately, it doesn't let me _reinterpret an array_ as I'm hoping to do. – Grimelios Dec 10 '21 at 19:45
  • 2
    Hi! I feel this question was closed prematurely (as many are). Although the linked question fundamentally boils down to the same answer (no, you can't reinterpret-cast arrays in C#), that question was asked half a decade ago, before `Span` even existed. In addition, I approached the question ("How do I reinterpret-cast an array?") from a different problem space, which may hold value to others. Finally, Matthew Watson's answer below gives an important insight (passing `T[]` directly to an input stream) not present in the other question. – Grimelios Dec 10 '21 at 20:28
  • "In the context in which I'm asking (loading video game assets on the fly), I'd like to optimize performance as much as I can." Pure memory copying is so fast that it's never going to matter, or even be detectable. – Boann Dec 11 '21 at 13:25
  • @Boann You may be right, and my question is certainly premature optimization (I haven't run into _measured_ performance problems yet). I suppose there's also an aspect of curiosity. It's interesting that reinterpret cast now exists in C# via `Span`, but seemingly only on the stack. – Grimelios Dec 11 '21 at 18:53
  • 1
    @Boann That's absolutely not the case when reading large arrays of primitives such as doubles. The regular (old-style) approach would have you using `BitConverter` to convert each `double` to a byte array for reading/writing from/to the stream. My timings with BenchmarkDotNet indicate that using `Span` with `MemoryMarshal.AsBytes()` is more than five times faster when writing and reading `MemoryStream`. – Matthew Watson Dec 11 '21 at 20:16

1 Answers1

8

To write you can do something like this:

public static void WriteArrayToStream<T>(Stream output, T[] array) where T: unmanaged
{
    var span = array.AsSpan();
    var bytes = MemoryMarshal.AsBytes(span);
    output.Write(bytes);
}

For reading, you can do something like this:

public static (T[] Result, int Count) ReadArrayFromStream<T>(Stream input, int n) where T: unmanaged
{
    T[] result = new T[n];
    var span   = result.AsSpan();
    var bytes  = MemoryMarshal.AsBytes(span);
    int count  = input.Read(bytes);

    return (result, count/Marshal.SizeOf<T>());
}

Note that it returns a tuple, because if not enough data is available, only the first Count elements will have valid data.

Here's an example to show both writing and reading a double[] array:

MemoryStream mem = new MemoryStream();
double[] data = Enumerable.Range(0, 100).Select(x => (double)x).ToArray();
WriteArrayToStream(mem, data);
Console.WriteLine(mem.Length); // 800

mem.Position       = 0;
var (array, count) = ReadArrayFromStream<double>(mem, 200);
Console.WriteLine(count); // 100
Console.WriteLine(array[42]);  // 42
Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
  • Thank you for the answer. Unfortunately, I'm trying to _read_ the data back in this case (deserialize), not write _to_ a file (serialization). You're right that I can (thankfully) write without additional allocations. – Grimelios Dec 10 '21 at 19:40
  • 1
    @Grimelios You can read too, I'll update the answer – Matthew Watson Dec 10 '21 at 19:40
  • That's an interesting approach, passing the `T[]` (as a byte span) directly to the input stream. That... may solve my problem (as with most SO question, I elided some complexity to simplify the question). If it does (I'll have to tinker first), I'll mark this response as accepted. Thank you! – Grimelios Dec 10 '21 at 19:57
  • 2
    After tinkering, I can confirm that the key insight from Matthew's answer (passing `Span` directly into an input stream, rather than buffering to a `byte[]` first) achieves my goal of deserializing a binary file with minimal allocations. Note that copying into a temporary buffer first (e.g. `fileStream.Read(bytes, 0, 4096)`) is still frequently useful when converting multiple distinct values from those bytes. – Grimelios Dec 11 '21 at 22:53