0

I'm handling a long-running process that does a lot of copying to a memory stream so it can create an archive and upload it to azure. In excess of 2 GB in size. The process is throwing an Out Of Memory exception at some point, and I'm trying to figure out what the upper limit is to prevent that, stop the process, upload a partial archive, and pick up where it left off.

But I'm not sure how to determine how much memory the stream is actually using. Right now, I'm using MemoryStream.Length, but I don't think that's the right approach.

So, how can I determine how much memory a MemoryStream is using in VB.Net?

In compliance with the Minimal, Reproducible Example requirements -

    Sub Main(args As String())
        Dim stream = New MemoryStream
        stream.WriteByte(10)
        'What would I need to put here to determine the size in memory of stream?
    End Sub
Will
  • 3,413
  • 7
  • 50
  • 107
  • https://learn.microsoft.com/en-us/dotnet/api/system.io.memorystream.capacity?view=net-7.0? – GSerg Feb 13 '23 at 22:38
  • thanks, but that doesn't suit my purposes. I set a breakpoint to monitor the Capacity property value, and it seemed to be increased by the system when the memorystream.Length property reached it. – Will Feb 13 '23 at 22:51
  • @Will What they're suggesting is you set the capacity for the stream up front. It might not be intuitive, but it can greatly help with this issue (though not always eliminate it completely). – Joel Coehoorn Feb 13 '23 at 22:55
  • Length is the correct property to use, it is very efficient. Hard limit in a 64-bit process is the [maximum size](https://stackoverflow.com/questions/3944320/maximum-length-of-byte) of a Byte() array, Integer.MaxValue - 56 in a 64-bit process. – Hans Passant Feb 13 '23 at 23:05

1 Answers1

1

This is a known issue with the garbage collector for .Net Framework (I've heard .Net Core may have addressed this, but I haven't done a deep dive myself to confirm it).

What happens is you have an item with an internal (growing) buffer like a string, StringBuilder, List, MemoryStream, etc. As you write to the object, the buffer fills and at some point needs to be replaced. So behind the scenes a new buffer is allocated, and data is copied from the old to the new. At this point the old buffer becomes eligible for garbage collection.

Already we see one large inefficiency: copying from the old buffer to the new can be a significant cost as the the object grows. Therefore, if you have an idea of your item's final size up front, you probably want to use a mechanism that will let you set that initial size. With a List<T>, for example, this means using the constructor overload to include a Capacity... if you know it. This means the object never (or rarely) has to copy the data between buffers.

But we're not done yet. When the copy operation is finished, the garbage collector WILL appropriately reclaim the memory and return it to the operating system. However, there's more involved than just memory. There's also the virtual address space for your application's process. The virtual address space formerly occupied by that memory is NOT immediately released. It can be reclaimed through a process called "compaction", but there are situations where this just... doesn't happen. Most notably, memory on the Large Object Heap (LOH) is rarely-to-never compacted. Once your item eclipses a magic 85,000 byte threshold for the LOH, you're starting to build up holes in the virtual address space. Fill these buffers enough, and the virtual address table runs out of room, resulting in (drumroll please)... an OutOfMemoryException. This is the exception thrown, even though there may be plenty of memory available.

Since most of the items that create this scenario use a doubling algorithm for the underlying buffer, it's possible to generate these exceptions without using all that much memory — just over the square root of the actual possible virtual address space, though that's worst-case and commonly you'll get somewhat higher first.

To avoid this, be careful in how you use items with buffers that grow dynamically: set the full capacity up front when you can, or stream to a more-robust backing store (like a file) when you can't.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • That's informative, thanks. Unfortunately, as this involves pulling from an unknowable, fluid data set ahead of time, it's impossible to know precisely how much memory the process will require. Streaming to a file might work... Then read the file back into memory, and put that up on azure.... I can look into that. Thanks for the information. – Will Feb 13 '23 at 22:55
  • You can sometimes use an absurdly large capacity to control this. For example, web servers typically have a configured limit to how large an upload stream can grow. If you set your memory stream capacity to match, you're greatly over-allocating the majority of the time, but at least you know you're within your limits. – Joel Coehoorn Feb 13 '23 at 22:57
  • Thanks for the suggestion - definitely, a FileStream is going to be the way to go with this. – Will Feb 14 '23 at 15:17