3

I'm having problems with out of memory exceptions when using a .Net MemoryStream if the data is large and the process is 32 bit.

I believe that the System.IO.Packaging API silently switches from memory to to file-backed storage as the data volume increases, and on the face of it, it seems it would be possible to implement a subclass of MemoryStream that does exactly the same thing.

Does anyone know of such an implementation? I'm pretty sure there is nothing in the framework itself.

Andy
  • 10,412
  • 13
  • 70
  • 95
  • Without information about how you're using that `MemoryStream`, it's really hard to make a specific recommendation. – Jim Mischel Jul 29 '13 at 13:15
  • Old question, but MSFT now has a replacement for `MemoryStream`: [`RecyclableMemoryStream`](https://github.com/microsoft/Microsoft.IO.RecyclableMemoryStream) from the nuget package [Microsoft.IO.RecyclableMemoryStream](https://www.nuget.org/packages/Microsoft.IO.RecyclableMemoryStream/). It is a *library to provide pooling for .NET MemoryStream objects to improve application performance, especially in the area of garbage collection.* – dbc Jul 20 '23 at 23:54

4 Answers4

10

Programmers try too hard to avoid using a file. The difference between memory and a file is a very small one in Windows. Any memory you use for a MemoryStream in fact requires a file. The storage is backed by the paging file, c:\pagefile.sys. And the reverse is true as well, any file you use is backed by memory. File data is cached in RAM by the file system cache. So if the machine has sufficient RAM then you will in fact only read and write from/to memory if you use a FileStream. And get the perf you expect from using memory. It is entirely free, you don't have to write any code to enable this nor do you have to manage it.

If the machine doesn't have enough RAM then it deteriorates the same way. When you use a MemoryStream then the paging file starts trashing and you'll be slowed down by the disk. When you use a file then the data won't fit the file system cache and you'll be slowed down by the disk.

You'll of course get the benefit of using a file, you won't run out of memory anymore. Use a FileStream instead.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • 4
    -1 There is a lot of points I believe are flat out wrong in this answer, but in the interest of space I'll only challenge this: "So if the machine has sufficient RAM then you will in fact only read and write from/to memory if you use a FileStream". This is absolutely not true. I'm currently working on a project where I'm writing 3 GB to a FileStream on a system with more than 50 GB of RAM. And I can tell you for a fact it's being continously persisted to disk. And MemoryStreams are not. – René Mar 11 '16 at 21:07
  • -1 ... I use MemoryStream all of the time and have no page file enabled on my Windows system. Can you cite your source saying a MemoryStream always uses a file on the disk? From https://github.com/Microsoft/referencesource/blob/master/mscorlib/system/io/memorystream.cs: // A MemoryStream represents a Stream in memory (ie, it has no backing store). // This stream may reduce the need for temporary buffers and files in // an application. – Jeffrey Kevin Pry Oct 06 '17 at 15:02
  • These are just the basics of a demand-paged virtual memory operating system. RAM pages are backed by the paging file. if you don't have one then you'd better have a lot of RAM, not that hard to come by these days. If the OS is under pressure anyway then it will start pilfering pages that it *can* unmap. Generally that will be pages with code, backed-up by the executable file. Mapping them back in when the code needs to run uses the disk to restore their content. Perhaps you can set such a hardware requirement for your customer as well, I never can. – Hans Passant Oct 06 '17 at 15:22
  • I'm confused. When using a FileStream, on a low memory system you'll avoid OOM and on a high memory system you'll benefit from writing to memory. But then you say in your comment it's wrong to use a FileStream instead of a MemoryStream on a high memory system? Can you please clarify, thanks. – user247702 Apr 18 '18 at 14:49
  • True about pagination. However, sometimes using a MemoryStream, or any other kind of memory storage to hold temporary data makes more sense than using files. For example, using files leaves more mopping up to do if sensitive data is being processed and is more prone to being breached if the process crashes before it manages to properly delete the temporary files. – Teorist Sep 28 '18 at 17:40
2

This is expected to happen using MemoryStream so you should implement you own logic or use some external class. here is a post that explains the problems with MemoryStream and big data and the post gives an alternative to MemoryStream A replacement for MemoryStream

Swift
  • 1,861
  • 14
  • 17
  • Thanks, I did find that before but their MemoryTributary only allows approx double the data size of a standard MemoryStream, so it's not a general solution – Andy Jul 29 '13 at 15:01
1

We've run into similar obstacles on my team. Some commenters have suggested that developers need to be more okay with using files. If it's an option to use the filesystem directly do that, but that's not always an option.

If, like we needed, you want to pass data read from a file around your application, you can't pass the FileStream object because it can get disposed before you're done reading the data. We originally resorted to MemoryStreams to let us pass the data around easily, but ran into the same problem.

We've used a couple different workarounds to mitigate the problem.

Options we've used include:

  • Implement a wrapper class to store the data in multiple (since arrays are still limited to int.MaxValue number of entries) byte[] objects and expose methods that enable you to almost treat them like a Stream. We still try to avoid this at all costs.
  • Use some sort of "token" to pass a reference to the location of the data and wait to load the data "just in time" in the application.
Steve M
  • 629
  • 3
  • 4
-1

I'd suggest checking out this project.

http://www.codeproject.com/Articles/348590/A-replacement-for-MemoryStream

I believe the problem with memory streams comes from the fact that underneath it all they are still a fancy wrapper for a single byte[] and so are still constrained by .net's requirement that all objects must be less than 2gb even in 64bit programs. The above implementation breaks the byte[] into several different byte[]s.

Andrew Long
  • 11
  • 1
  • 1