0

I'm facing a strange issue. I haven't had any problems in the past with deserializing huge JSON files but now I have the following code:

 private JObject ReadJsonFile(string pathToFile)
    {
        JObject jsonObject = null;
        try
        {
            using (FileStream s =new FileStream(pathToFile, FileMode.Open, FileAccess.Read))
            using (StreamReader sr = new StreamReader(s))
            using (JsonReader reader = new JsonTextReader(sr))
            {
                JsonSerializer serializer = new JsonSerializer();
                jsonObject = serializer.Deserialize<JObject>(reader);                   
            }
           
        }
        catch (Exception exc)
        {
            _log.Error($"Error during reading file {exc}");
        }
        return jsonObject;
    }

Which works fine but from time to time during runtime line jsonObject = serializer.Deserialize<JObject>(reader); throws Out of memory even though the JSON file is around 20 MB which is very strange. Has anyone had a similar issue ?

The file has a lot of columns; when I open it in Notepad++ it shows 8 099 893 characters in one line. Maybe there is a case for that ?

kurkey
  • 31
  • 3
  • Are you running this code under 32 or 64 bit process? – Evk Oct 19 '20 at 09:17
  • 1
    You gain nothing by using `JsonReader` this way - you still deserialize the entire file in a single operation. Either use `JsonReader` as a *reader*, reading elements one by one, or change the format of the file to store a single JSON string per line. That's what event processing and logging libraries do. Appending a new element to a JSON array requires deserializing the entire array, adding the element and saving it again. With a single JSON string per line, all you need is `AppendLine`. Same for reading - you can read individual lines in a loop – Panagiotis Kanavos Oct 19 '20 at 09:17
  • What does the JSON file look like? How complex are the objects? – Panagiotis Kanavos Oct 19 '20 at 09:18
  • 1
    You might find [this post](https://stackoverflow.com/questions/43747477/how-to-parse-huge-json-file-as-stream-in-json-net/56411253#56411253) to be useful with regard to extraction of specific features from JSON document without parsing the entire file. – spender Oct 19 '20 at 09:19
  • 1
    One of the most common reasons for an OOM exception is adding items to lists inefficiently. Lists store data in internal buffers. When they are full, the list allocates a new buffer with double the size, copies the data and discards the old buffer which now needs to be GC'd. This can result in big delays and memory fragmentation. After a while memory can become so fragmented that the runtime can't find one contiguous memory block to allocate for a new buffer. That's why it's a *lot* faster to specify a capacity when creating a list. – Panagiotis Kanavos Oct 19 '20 at 09:24
  • Deserializing an array with a lot of items needs to add them one by one as the deserializer doesn't know how many items are there. I suspect your current code is rather slow already due to all those reallocations. – Panagiotis Kanavos Oct 19 '20 at 09:26
  • 1
    "The file has a lot of columns, when I open it in Notepad++ it shows 8 099 893 files." <-- what does this sentence mean? – Lasse V. Karlsen Oct 19 '20 at 09:41
  • File has 14 properties saved, 12 of them are lists of objects they are quite complex objects – kurkey Oct 19 '20 at 10:10
  • One of the question is that, why my code is working fine for better part of runtime and then it throws the exception out of the blue and file size did not changed – kurkey Oct 19 '20 at 10:25
  • @kurkey Sometime your system might have spare memory some time don't – Mat J Oct 19 '20 at 10:55
  • @kurkey it's not out of the blue. I already explained what happens. The way you load this file creates a lot of temporary objects and fragments memory. You need to change this. Either change the format to a single JSON object per line, or use `JsonReader` to explicitly read elements in a streaming manner. Without knowing what your JSON looks like it's hard to offer better advice – Panagiotis Kanavos Oct 19 '20 at 10:56
  • @kurkey `spender` posted what's actually a duplicate in the comments – Panagiotis Kanavos Oct 19 '20 at 10:58
  • @PanagiotisKanavos Indeed, it is a duplicate. Mjolnir deployed. – spender Oct 19 '20 at 11:12
  • At the end we changed the way of serializing data into the file and it shrank more than 60% than the original and also reading one property by one did the trick – kurkey Oct 26 '20 at 11:49

0 Answers0