1

I have a 32-bit .NET Core 3.1 app (important, because we don't crash with a 64-bit app; at least not with this much data).

It deserializes data that comes in from another process that I read via a named pipe:

TextReader reader = TextReader.Synchronized(new StreamReader(namedPipeServer));
string data = reader.ReadLine(); // messages are separated by newlines
string deserializedMessage = JsonConvert.DeserializeObject<string>(data); // string for simplicity; it's actually a more complex object

When the data is roughly 100MB (so set data to new string('a', 100 * 1024 * 1024)), Newtonsoft crashes with an OutOfMemory Exception:

System.OutOfMemoryException
  HResult=0x8007000E
  Message=Exception of type 'System.OutOfMemoryException' was thrown.
  Source=Newtonsoft.Json
  StackTrace:
   at Newtonsoft.Json.Utilities.BufferUtils.RentBuffer(IArrayPool`1 bufferPool, Int32 minSize)
   at Newtonsoft.Json.JsonTextReader.PrepareBufferForReadData(Boolean append, Int32 charsRequired)
   at Newtonsoft.Json.JsonTextReader.ReadData(Boolean append, Int32 charsRequired)
   at Newtonsoft.Json.JsonTextReader.ReadStringIntoBuffer(Char quote)
   at Newtonsoft.Json.JsonTextReader.ReadStringValue(ReadType readType)
   at Newtonsoft.Json.JsonTextReader.ReadAsString()
   at Newtonsoft.Json.JsonReader.ReadForType(JsonContract contract, Boolean hasConverter)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(JsonReader reader, Type objectType, Boolean checkAdditionalContent)
   at Newtonsoft.Json.JsonSerializer.DeserializeInternal(JsonReader reader, Type objectType)
   at Newtonsoft.Json.JsonSerializer.Deserialize(JsonReader reader, Type objectType)
   at Newtonsoft.Json.JsonConvert.DeserializeObject(String value, Type type, JsonSerializerSettings settings)
   at Newtonsoft.Json.JsonConvert.DeserializeObject[T](String value, JsonSerializerSettings settings)
   at Newtonsoft.Json.JsonConvert.DeserializeObject[T](String value)
   at OutOfMemory.Program.Main(String[] args)

I found this related question that suggested to use streams, however I get the same crash doing that:

using JsonReader reader = new JsonTextReader(new StringReader(data));
JsonSerializer serializer = new JsonSerializer();
string deserializedMessage = serializer.Deserialize<string>(reader);
System.OutOfMemoryException
  HResult=0x8007000E
  Message=Exception of type 'System.OutOfMemoryException' was thrown.
  Source=Newtonsoft.Json
  StackTrace:
   at Newtonsoft.Json.Utilities.BufferUtils.RentBuffer(IArrayPool`1 bufferPool, Int32 minSize)
   at Newtonsoft.Json.JsonTextReader.PrepareBufferForReadData(Boolean append, Int32 charsRequired)
   at Newtonsoft.Json.JsonTextReader.ReadData(Boolean append, Int32 charsRequired)
   at Newtonsoft.Json.JsonTextReader.ReadStringIntoBuffer(Char quote)
   at Newtonsoft.Json.JsonTextReader.ReadStringValue(ReadType readType)
   at Newtonsoft.Json.JsonTextReader.ReadAsString()
   at Newtonsoft.Json.JsonReader.ReadForType(JsonContract contract, Boolean hasConverter)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(JsonReader reader, Type objectType, Boolean checkAdditionalContent)
   at Newtonsoft.Json.JsonSerializer.DeserializeInternal(JsonReader reader, Type objectType)
   at Newtonsoft.Json.JsonSerializer.Deserialize(JsonReader reader, Type objectType)
   at Newtonsoft.Json.JsonSerializer.Deserialize[T](JsonReader reader)
   at OutOfMemory.Program.Main(String[] args)

I have considered not reading the full line of data, but instead using a stream from the get-go, like so:

using JsonReader reader = new JsonTextReader(textReader);
JsonSerializer serializer = new JsonSerializer();
serializer.Deserialize<string>(reader);

But that just hangs, I think because according to the docs, JsonSerializer.Deserialize will read the Stream to completion, but I might have multiple messages coming through on the Stream.

Is there another solution to this, or am I just doing something wrong with the Streams?

Example Main method:

int mB = 150;
var serializedMessage = JsonConvert.SerializeObject(new string('a', mB * 1024 * 1024));

using JsonReader reader = new JsonTextReader(new StringReader(serializedMessage));
JsonSerializer serializer = new JsonSerializer();
string result = serializer.Deserialize<string>(reader);
//result = JsonConvert.DeserializeObject<string>(serializedMessage);
Console.WriteLine(result.Length);
Console.ReadKey();
pushkin
  • 9,575
  • 15
  • 51
  • 95
  • 2
    100 MB. of JSON is *a lot*. Even the well-optimized `System.Text.Json` struggles. What is posted above eats up about 1.4 GB of memory, and `System.Text.Json`'s memory usage is at about 1.0 GB. For example: `var fin = System.Text.Json.JsonSerializer.Deserialize(System.Text.Json.JsonSerializer.Serialize(new string('a', 100 * 1024 * 1024)));` You may have to figure out how to optimize how data is being sent to you. This isn't a bug/issue by any means. – Andy Jul 20 '22 at 22:06
  • @Andy fair enough - I'll speak with the team that's sending this data (it's clipboard data containing images) and see if it can be broken up into chunks – pushkin Jul 20 '22 at 22:36
  • 1
    So that means it's serializing a `byte[]` object, and Newtonsoft converts that type to a Base64 string. Base64 increases the amount of memory needed by 33%. I would consider breaking it in to 5 MB chunks, then serializing that. That will give you ~7 MB of JSON which is way more manageable when dealing with memory. Maybe even put a chunk index in the packet, so things stay in order. Just some friendly tips. – Andy Jul 20 '22 at 23:56
  • `string data = reader.ReadLine(); ` -- this extra allocation seems unnecessary. You can read a sequence of top-level JSON objects directly by setting `JsonTextReader.SupportMultipleContent ` as shown in [Line delimited json serializing and de-serializing](https://stackoverflow.com/q/29729063/3744182). And, since your JSON is large, this allocation is likely to be large. – dbc Jul 21 '22 at 01:06
  • 1
    But it is true that Json.NET will always completely materialize each atomic individual string value. How long is the longest string value in your JSON? – dbc Jul 21 '22 at 01:08
  • But why are you deserializing to `string`? Is your other process sending a sequence of JSON serialized string values? If so, why use JSON at all? Why not just send a sequence of strings? Is your JSON actually double-serialized for some reason? – dbc Jul 21 '22 at 01:10
  • @dbc Interesting, will look into SupportMultipleContent (I only used JsonTextReader in my example code; I don't actually use the stream reader in my application; was just testing things out) – pushkin Jul 21 '22 at 14:46
  • @dbc The longest string value (there's really only one long one) in the workflow that triggers this crash is 113,851,273 characters (RTF data) – pushkin Jul 21 '22 at 14:47
  • @dbc Deserializing to a string was just a simple example to demonstrate the crash. I'm really deserializing to some wrapper object that contains arbitrary data provided by this other process. Could be a string or anything else. – pushkin Jul 21 '22 at 14:48
  • If your JSON contains an RTF file as an embedded 200KB string, what are you doing with that after you receive it? Are you writing it to file? The only .NET JSON parser I know of that doesn't fully materialize string values is the one returned by `JsonReaderWriterFactory`. see e.g. [JsonConvert Deserialize Object out of memory exception](https://stackoverflow.com/a/66095518/3744182) or [Efficiently replacing properties of a large JSON using System.Text.Json](https://stackoverflow.com/a/59850946/3744182). – dbc Jul 21 '22 at 16:05
  • @dbc It's a 100MB string I believe. The string gets sent from my .NET Core process through a Node process (Electron app) and then delivered to a web app hosted by my Electron app. Then, I'm not sure what the app does with it exactly. – pushkin Jul 21 '22 at 16:26
  • A string that big is going to cause garbage collection problems, especially in a 32-bit app. Did you enable [large object heap compaction](https://stackoverflow.com/q/20035550)? – dbc Jul 21 '22 at 16:39
  • @dbc I did not. What would that do for me? The string would exist in memory for a short period before being sent off to another process. – pushkin Jul 21 '22 at 17:57
  • For the reasons explained in the answer. Absent LOH compaction, due to memory fragmentation you can run out of **contiguous virtual** address space in a 32-bit app if you are allocating and freeing many large chunks of memory of disparate sizes. See also [What causes memory fragmentation in .NET](https://stackoverflow.com/a/5243624), [GCSettings.LargeObjectHeapCompactionMode Property](https://learn.microsoft.com/en-us/dotnet/api/system.runtime.gcsettings.largeobjectheapcompactionmode?view=net-6.0). – dbc Jul 21 '22 at 18:03
  • @dbc Ok. I just gave that a shot (set to CompaceOnce and explicitly garbage collected before running the deserialization logic) and still ran into the OutOfMemoryException fwiw. Hopefully I went about it in the right way – pushkin Jul 21 '22 at 18:32
  • If you also got rid of the `reader.ReadLine()` and used `JsonTextReader.SupportMultipleContent`, then it may be that your 32 bit app simply doesn't have enough virtual address space to make a 200MB sized **contiguous** virtual allocation. You may need to reconsider your architecture. Do you really need the RTF in memory as a contiguous string? Could you stream it to disk temporarily instead? Can you share a [mcve] showing your actual JSON, and what you need to extract from it? – dbc Jul 21 '22 at 18:49
  • @dbc I am discussing workarounds with this other app. Most likely they'll just have to chunk the data. I actually tested the GC with the ReadLine() call still and never got JsonTextReader to work with SupportMultipleContent true. It works in a minimal example app, but in my case, it just hangs for some reason. – pushkin Jul 22 '22 at 18:34
  • 1
    If you can work with the other app's developers, then chunking + message framing is probably the way to go. – dbc Jul 22 '22 at 18:42

0 Answers0