3

I have a large object in my C# code which can be as large as 15GB. Internally it has a 2D array of doubles and 2 lists of strings which describe the rows and columns of the 2D array.

This object has a method WriteToTextWriter(StreamWriter s) which writes a header and the entire data in the 2D array to the StreamWriter s. The StreamWriter is initialized using a MemoryStream object.

I have another class which uses HttpClient to post data from a Stream to a remote server. It has a method PostStreamData(string URL, Stream s).

My current code is something like this:

var x = MyLargeObject();
using (var memStream = new MemoryStream())
using (var streamWriter = new StreamWriter(memStream))
{
    x.WriteToTextWriter(streamWriter);
    customClient.PostStreamData(url, memStream);
}

Internally, PostStreamData creates a StreamContent() using the stream object it gets passed in, sets this content as the Content property of the HttpRequestMessage object and then finally sends this using SenAsync method.

Since this uses MemoryStream, it fails when the object size gets larger than 2GB. See this: Failed to write large amount of data to stream

To overcome this, I used the HugeMemoryStream class implemented there. But now the issue is that I am using twice the memory. 15GB for the MyLargeObjet which is already in memory and then another 15GB for the HugeMemoryStream object created using it.

I think a better solution would be to implement a class based on Stream which uses a buffer of limited size but still allows for objects larger than 2GB. How to implement this? I am looking for some sample code. It doesn't have to be complete, but right now I don't even know how to start.

Stupid Man
  • 885
  • 2
  • 14
  • 31
  • 2
    If you want to reduce memory usage, don't use a memory stream. You probably want to send the data over http in a number of smaller requests anyway. – Jeremy Lakeman Sep 29 '22 at 05:27
  • I see a couple of minor typos: `SenAcync`, and `MyLargeObjet` – Wyck Sep 29 '22 at 15:25
  • 1
    I had a similar problem back then. The destination device was a small arm64 device with a smaller ram but it had enough storage. So whenever we had to send an update to that device, we had to split byte arrays into multiple packages. First message we've sent to device is; how many packages and kilobytes should it expect. After receiving every package, it checked if the package is all there. If yes; we concated the received packages. That is one way to handle it. – tataelm Oct 02 '22 at 08:23
  • have you checked why the additional memory is being allocated, my blind guess is that you are allocating strings in the memory stream. String representation of the int & float will take up more space than the binary representation. If the binary objects are first loaded and the strings are created, you will have the orginal and a bloated copy in memory – Surya Pratap Oct 04 '22 at 07:49
  • Why are you unable to write directly to the response stream in HttpClient? that should remove the need for additional memory stream – Surya Pratap Oct 04 '22 at 07:54

3 Answers3

2

You could inherit from Stream and keep a reference to MyLargeObject. Then you implement Read method where you serialize your largeobject to the byte array parameter of Read. You must implement Canseek, canwrite where you just return false. The other methods just throw notsupportedexception. You would use it like this:

var content = new StreamContent(new MyStream(mylargeobject))

Also check out this implementation : https://ec.europa.eu/digital-building-blocks/code/projects/EDELIVERY/repos/eessi-as4.net/browse/source/AS4/Eu.EDelivery.AS4/Streaming/VirtualStream.cs?at=a37db0be60a5c441fdb6c9d65f7c4c4621840b92

bgman
  • 309
  • 1
  • 5
  • > where you serialize your largeobject to the byte array parameter of Read @bgman Can you please explain this part? `Memorystream` uses a 1 dimensional byte array as a buffer, so how do I serialize a 15GB object into this? – Stupid Man Sep 30 '22 at 07:34
  • 1
    MyStream has a reference to LargeObject and you implement the Read method where you have to put in the array parameter a number of bytes from the current position. Check the docs for Stream.Read. I'm just saying, you have to implement only the Read method, but this might be a difficult task – bgman Sep 30 '22 at 09:03
  • I was discussing this problem with a colleague and he too suggested what you are telling me but I don't understand how this will reduce memory usage. The code in my pot has this line: `x.WriteToTextWriter(streamWriter);` The writeToTextWriter method writes the entire 15GB of data to the StreamWriter. Will this also need some modifications? – Stupid Man Sep 30 '22 at 09:16
  • 1
    You use direkt your custom stream to create StreamContent as I showed. If you use WritetoTextWriter to write to a stream, then you'll have 2 large objects: the object itself and the stream, as you pointed out. The custom stream doesen't increase memory because it holds just a reference to large object. Still it will be not easy to implement the read method, you should adapt the writetotextwriter. Its not easy because you have to keep track of the current position. For example: the first read reads 1000 bytes, now at the second read you have to give the next 1000 bytes, – bgman Sep 30 '22 at 13:16
  • I was able to implement the kind of stream you have described here. The problem now is that for larger files (> 2GB), I am getting this exception: "Cannot write more bytes to the buffer than the configured maximum buffer size: 2147483647" I did try the solution described here https://stackoverflow.com/questions/18720435/httpclient-buffer-size-limit-exceeded which suggests using HttpCompletionOption.ResponseHeadersRead but it hasn't made any difference. Continued in next comment... – Stupid Man Jun 23 '23 at 12:58
  • Even with this, my code fails with the same exception at the same line. The point where I am making the POST request. Surprisingly, the HugeMemoryStream solution which I had copied from the question linked in my original post works fine. Note that my newly implemented stream where I have just implemented the Read method, works fine with files up to 2GB. Any idea, what could be wrong? – Stupid Man Jun 23 '23 at 12:58
1

I don't think it's a good idea at all using memory to manage data like this, you're better off writing it to disk and then posting it through a separate service with paralell uploads and retry logic.

Oh, if you're also working big binary files like this, might want to have a look at using some kind of diff algorithm to generate patch files and use them instead, like bsdiff or xdelta.

Useful Binary Diff Tool (other than msdn[apatch and mpatch], xdelta, bsdiff, vbindiff and winmerge)

Pedro Luz
  • 973
  • 5
  • 14
1

Use PushStreamContent (in System.Net.Http.Formatting.dll):

var x = MyLargeObject();
var content = new PushStreamContent((stream, httpContent, transportContext) =>
{
    using (var streamWriter = new StreamWriter(stream))
        x.WriteToTextWriter(streamWriter);  
    stream.Close();
});

customClient.PostStreamData(url, content);
dovid
  • 6,354
  • 3
  • 33
  • 73