4

What is optimal way to get TextReader instance from a Memory<byte> object?

I could write something like:

using (var stream = new MemoryStream(body.ToArray()))
using (var reader = new StreamReader(stream))
{
}

but maybe there is a better way?

obratim
  • 467
  • 5
  • 13

1 Answers1

3

StreamReader will dispose underlying Stream automatically.

#1 The simpliest way

Memory<byte> memory = GetSomeData();
using TextReader reader = new StreamReader(new MemoryStream(memory.ToArray()));
// some code

But here you're copying whole memory content into another array, it's memory-consuming and gives Garbage Collector more work. It's not recommended especially if array contains large amount of data.

There's another way of doing it without allocation of new array.

#2 The optimal way (recommended to save memory)

Memory<byte> memory = GetSomeData();
if (MemoryMarshal.TryGetArray(memory, out ArraySegment<byte> segment))
{
    using TextReader reader = new StreamReader(new MemoryStream(segment.Array, segment.Offset, segment.Count));
    // some code
}

In other words ArraySegment returns the source memory area as array.

Tests

Here's an example to play with it (based on .NET Core 3.1 Console Application).

class Program
{
    static void Main(string[] args)
    {
        string text = "Hello World!";
        byte[] data = Encoding.UTF8.GetBytes(text);
        Memory<byte> memory = data;

        byte[] data1 = memory.ToArray();
        Console.WriteLine("data == data1: {0}", data == data1);
            
        if (MemoryMarshal.TryGetArray(memory, out ArraySegment<byte> segment))
        {
            byte[] data2 = segment.Array;
            Console.WriteLine("data == data2: {0}", data == data2);
        }

        Console.WriteLine();

        Console.WriteLine("Test 1");
        Test1(text);

        Console.WriteLine();

        Console.WriteLine("Test 2");
        Test2(text);

        Console.ReadKey();
    }

    private static void Test1(string text)
    {
        Memory<byte> memory = Encoding.UTF8.GetBytes(text);
        byte[] data = memory.ToArray();
        ReadItTwice(memory, data);
    }

    private static void Test2(string text)
    {
        Memory<byte> memory = Encoding.UTF8.GetBytes(text);
        if (MemoryMarshal.TryGetArray(memory, out ArraySegment<byte> segment))
        {
            byte[] data = segment.Array;
            ReadItTwice(memory, data);
        }
    }

    private static void ReadItTwice(Memory<byte> memory, byte[] data)
    {
        using MemoryStream ms = new MemoryStream(data);
        using TextReader sr = new StreamReader(ms);
        Console.WriteLine("Before change: {0}", sr.ReadToEnd());
        if (MemoryMarshal.TryGetArray(memory, out ArraySegment<byte> segment))
            segment.Array[0] = (byte)'_'; // change first symbol
        ms.Position = 0;
        Console.WriteLine("After change: {0}", sr.ReadToEnd());
    }
}

Output

data == data1: False
data == data2: True

Test 1
Before change: Hello World!
After change: Hello World!

Test 2
Before change: Hello World!
After change: _ello World!
EM0
  • 5,369
  • 7
  • 51
  • 85
aepot
  • 4,558
  • 2
  • 12
  • 24
  • @obratim clone of the array as memory span is performed by DMA and it's very fast and doesn't involve CPU in this operation (surprise!). But cloning the array is allocation of additional memory which is later must be collected by GC. GC operations are expensive, especially if the each array length >85kb. You're comparing speed between copying data method and not copying. It's memory-bound issue, not CPU. For example if you need to perform this operation 100k times with 100k array, per second, what do you prefer: `100k x 100k = 10MB` additional memory allocations per second or zero? – aepot Aug 13 '20 at 14:22
  • I've done correct benchmarking: it seems like MemoryMarshal way is slower, but allocates less memory. benchmark: https://mega.nz/file/htxjHa7T#POtmGcZzTq60u2T1m9-x2QTPG-9Hnx1TawhCQoiPoiw results: https://mega.nz/file/04hTVA4K#R3ahehgUoXp3IOAHNdBi-T5tfNDUc0Nb6LvoSMFc7MY – obratim Aug 13 '20 at 14:42
  • 1
    as you explained, benchmark don't measure time for garbage collection, thats why MemoryMarshal way seems slower – obratim Aug 13 '20 at 14:43
  • 2
    @obratim Just one note: pay attention to `segment.Offset` and `segment.Length` of `ArraySerment`. `MemoryMarshal` returns whole array as `segment.Array` used for `Memory` creation. Thus the `segment.Array.Length` and `segment.Length` can be different. Just a tip. – aepot Aug 13 '20 at 15:10
  • Indeed, so it's safer to pass `segment.Offset` and `segment.Count` to the `MemoryStream` constructor to read only the part of the array represented by the `ArraySegment`. I've edited that into the answer. For example, if you create `new Memory(data, 6, 5)` then way #1 would read "World", while the original way #2 would have read "Hello World!". – EM0 Sep 08 '22 at 15:40