45

I've known that GetBuffer() on a MemoryStream in C#/.NET has to be used with care, because, as the docs describe here, there can be unused bytes at the end, so you have to be sure to look only at the first MemoryStream.Length bytes in the buffer.

But then I ran into a case yesterday where bytes at the beginning of the buffer were junk! Indeed, if you use a tool like reflector and look at ToArray(), you can see this:

public virtual byte[] ToArray()
{
    byte[] dst = new byte[this._length - this._origin];
    Buffer.InternalBlockCopy(this._buffer, this._origin, dst, 0,
        this._length - this._origin);
    return dst;
}

So to do anything with the buffer returned by GetBuffer(), you really need to know _origin. The only problem is that _origin is private and there's no way to get at it...

So my question is - what use is GetBuffer() on a MemoryStream() without some apriori knowledge of how the MemoryStream was constructed (which is what sets _origin)?

(It is this constructor, and only this constructor, that sets origin - for when you want a MemoryStream around a byte array starting at a particular index in the byte array:

public MemoryStream(byte[] buffer, int index, int count, bool writable, bool publiclyVisible)

)

kmatyaszek
  • 19,016
  • 9
  • 60
  • 65
aggieNick02
  • 2,557
  • 2
  • 23
  • 36
  • 2
    Looking at the source code, _origin is always 0 in practice for MemoryStreams where the buffer was allocated by MemoryStream itself. But ofcourse, relying on this would not be very robust. I see MemoryStream has an internal method 'InternalGetOriginAndLength' that would be usable if it were public! – avl_sweden Nov 08 '16 at 07:34
  • I mean, relying on it always being 0 would be non-robust because you (or someone else) may someday modify your program to create a stream with a non-zero origin. I don't think Microsoft will ever change MemoryStream; too many programs would probably break. – avl_sweden Nov 08 '16 at 07:36

7 Answers7

24

The answer is in the GetBuffer() MSDN doc, you might have missed it.

When you create a MemoryStream without providing a byte array (byte[]) :

it creates an expandable capacity initialized to zero.

In other words, the MemoryStream will reference to a byte[] with the proper size when a Write call will be made on the Stream.

Thus, with GetBuffer() you can directly access the underlying array and read to it.

This could be useful when you're in the situation that you will receive a stream without knowing its size. If the stream received is usually very big, it will be much faster to call GetBuffer() than calling ToArray() which copy the data under the hood, see below.

To obtain only the data in the buffer, use the ToArray method; however, ToArray creates a copy of the data in memory.

I wonder at which point you might have called GetBuffer() to get junk data at the beginning, it could be between two Write calls where the data from the first one would have been garbage collected, but I'm not sure if that could happen.

l33t
  • 18,692
  • 16
  • 103
  • 180
ForceMagic
  • 6,230
  • 12
  • 66
  • 88
  • This does not seem to address the question at all. The OP's point is, what good is direct access to the array containing the data, if you don't know the valid start index of that array – Vikhram Feb 18 '16 at 17:07
  • 6
    @Vikhram The question is : When is GetBuffer() on MemoryStream ever useful?, I explain a pretty good example when this can be useful. – ForceMagic Feb 18 '16 at 17:31
17

If you really want to access the internal _origin Value, you may use the MemoryStream.Seek(0, SeekOrigin.Begin) call. The return Value will be exactly the _origin Value.

marian.pascalau
  • 193
  • 1
  • 5
  • 2
    This should be the accepted answer. Also, looking at the source code, MemoryStream _origin is only ever !=0 if the user has supplied an index to the constructor. When MemoryStream itself allocates memory, _origin is always 0. – avl_sweden Nov 08 '16 at 07:33
  • 2
    Agreed. This is the best option if using .NET 4.5 or earlier. If you're on 4.6 or later, the TryGetBuffer answer below is a little nicer aesthetically. – aggieNick02 May 31 '18 at 15:24
16

.NET 4.6 has a new API, bool MemoryStream.TryGetBuffer(out ArraySegment<byte> buffer) that is similar in spirit to .GetBuffer(). This method will return an ArraySegment that includes the _origin information if it can.

See this question for details about when .TryGetBuffer() will return true and populate the out param with useful information.

Community
  • 1
  • 1
chwarr
  • 6,777
  • 1
  • 30
  • 57
  • 1
    This is an awesome option if on .NET 4.6 or later. Your linked question is great too, and the .NET reference source shows TryGetBuffer will work whenever GetBuffer works, so it definitely seems like a better option if you've got it. – aggieNick02 May 31 '18 at 15:26
12

ToArray() is the alternative of GetBuffer(). However ToArray() makes a copy of the object in the memory. If the bytes are more than 80000 the object will be placed in the Large Object Heap (LOH). So far nothing fancy. However the GC does not handle very well the LOH and the objects in it (the memory is not freed as you expect). Because of this OutOfMemoryException can occur. The solution is to either call GC.Collect() so that those objects get collected or to use GetBuffer() and create several smaller (less than 80000 bytes) objects - those will not go to the LOH and the memory will be freed as expected by the GC.

A third (better) option exists and that is to use only streams, e.g. read all the bytes from a MemoryStream and directly write them to HttpResponse.OutputStream (using again byte array < 80000 bytes as a buffer). However this is not always possible (as it was in my case).

As a summary we can say that when a in-memory copy of the object is not desired you will have to avoid ToArray() and in those cases GetBuffer() might come in handy, but might not be the best solution.

Unknown
  • 1,377
  • 1
  • 15
  • 33
  • 3
    I'm pretty sure the original poster knew that ToArray creates a copy, and that GetBuffer could be more performant because of this. However, the question is "how can GetBuffer be used correctly", which your answer does not address. Without getting at the _origin-field we cannot know where the real data in the returned buffer starts. – avl_sweden Nov 08 '16 at 07:29
9

It can be useful if you're using a low level API that takes an ArraySegment, such as Socket.Send. Rather than call ToArray which will create another copy of the array you can create a segment:

var segment=new ArraySegment<byte>(stream.GetBuffer(), 0, stream.Position);

and then pass that to the Send method. For large data this will avoid allocating a new array and copying into it, which could be expensive.

Sean
  • 60,939
  • 11
  • 97
  • 136
7

The most important point from the GetBuffer MSDN documentation, other than it not creating a copy of the data, is that it returns an array that has unused bytes:

Note that the buffer contains allocated bytes which might be unused. For example, if the string "test" is written into the MemoryStream object, the length of the buffer returned from GetBuffer is 256, not 4, with 252 bytes unused. To obtain only the data in the buffer, use the ToArray method; however, ToArray creates a copy of the data in memory.

So if you really want to avoid creating a copy due to memory constraints, you have to be careful to not send the whole array from GetBuffer over the wire or dumping it to a file or attachment, because that buffer grows by powers of 2 whenever it is filled and almost always has a lot of unused bytes at the end.

Saeb Amini
  • 23,054
  • 9
  • 78
  • 76
5

GetBuffer() always assumes you know the structure of the data fed into the string (and that's its use). If you want to get data out of the stream, you should always use one of the provided methods (e.g. ToArray()).

Something like this can be used, but only case I could think of right now would be some fixed structure or virtual file system sitting in the stream. For example, at your current position you're reading an offset for a file sitting inside the stream. You then create a new stream object based on this stream's buffer but with the different _origin. This saves you from copying the whole data for the new object, which might enable you to save lots of memory. This saves you from carrying the initial buffer as a reference with you, because you're always able to retrieve it once again.

Mario
  • 35,726
  • 5
  • 62
  • 78
  • 1
    I agree it saves you from having to carry the original reference with you. Perhaps I've asked the question wrong... `GetBuffer()` seems like it would have some real usefulness and functionality to offer if there was a public property getter around `_origin`. I wonder why such a property is not made available... – aggieNick02 Nov 26 '12 at 15:37