Parsing large JSON data with NewtonSoft Json.Net

Question

I am doing some testing with NewtonSoft Json.Net and am running into some issues with large Json datasets.

I have a dataset which is 400MB in size and it seems that no matter how I parse it, I keep getting out of memory exceptions. I tried the standard Parse method and tried the streamreader method where I am parsing the tokens. Either way, at some point, I end up with a memory exception.

The stack trace indicates the problem is centered around the use of Int32 to index the document (see below). I thought using JsonTextReader would solve that problem, but it doesn't appear to be the case.

Here is my stream request from the server:

public Task<Stream> GetAsync(string command)
{
    Task<Stream> promise;
    using (var handler = new HttpClientHandler { Credentials = new NetworkCredential(username, password) })
    {
        var client = new HttpClient(handler);
        promise = client.GetStreamAsync(command);
    }

    return promise;
}

Then, I start the work in this way:

using (Stream s = task.GetAwaiter().GetResult())
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
    while (reader.Read())
    {
        if( reader.TokenType == JsonToken.PropertyName && reader.Value.ToString() == "Values" ) {
            reader.Read();
            if( reader.TokenType != JsonToken.StartArray )
                break;
            while( reader.Read() ) {
                // Do more parsing;
            }
        }
    }
}

Is there some configuration in Json.net I need to set to avoid this problem? Or do I need to break down the response from the server to smaller chunks? If so how do I deal with partial json?

Here is the stack trace:

   at System.String.CtorCharArrayStartLength(Char[] value, Int32 startIndex, Int32 length)
   at Newtonsoft.Json.Utilities.StringReference.ToString()
   at Newtonsoft.Json.JsonTextReader.ParseString(Char quote)
   at Newtonsoft.Json.JsonTextReader.ParseValue()
   at Newtonsoft.Json.JsonTextReader.ReadInternal()
   at Newtonsoft.Json.JsonTextReader.Read()

See [Deserialize json array stream one item at a time](http://stackoverflow.com/q/20374083/10263). Also make sure that when you are working on a particular item from the JSON you are not adding it to a list or some other structure that stays in memory. That defeats the whole purpose of incremental processing. You want to process the item, do what you need to do with it (e.g. add it to a database) then let go of all references to it before going onto the next item. — Brian Rogers, Jan 22 '15 at 21:40
I don't think I am doing things right anyway...my understanding is that GetResults() waits for the asynchronous response to complete before continuing...so this may actually be where the hang up is. Although I believe the Json.Net coding might be correct, because of the GetResults() it is not "technically" streaming. Anyway, the json looks like this: — user3072517, Jan 22 '15 at 22:54
Rats! Hit enter by accident....Here is the JSON: `{ "id": "Data", "Values": [ { "Name": "Fred", "Address" : "123 Blake Street", "Relatives" : { "Father" : "Bill", "Mother" : "Mary" } }, { ... } ] } ` — user3072517, Jan 22 '15 at 22:57
Your Json.NET coding looks good. What do you do in `Do more parsing`? — dbc, Jan 23 '15 at 00:44
I just continue to parse the document in a structure similar to what I posted. It works on one dataset with 839,050 items, but crashes on another dataset with only 415,146 items, but there are more "Relatives". The first dataset is 173MB and the second one is 404MB. — user3072517, Jan 23 '15 at 00:58
[This post](https://stackoverflow.com/questions/22341042/download-an-image-using-the-least-amount-of-memory) suggests that `GetStreamAsync()` does not load the entire stream into memory. So, I'll second what @BrianRogers said - perhaps you are saving references to every parsed item as you go? — dbc, Jan 23 '15 at 03:09

Parsing large JSON data with NewtonSoft Json.Net

0 Answers0