38

I have used the "JsonConvert.Deserialize(json)" method of Json.NET so far which worked quite well and to be honest, I didn't need anything more than this.

I am working on a background (console) application which constantly downloads the JSON content from different URLs, then deserializes the result into a list of .NET objects.

 using (WebClient client = new WebClient())
 {
      string json = client.DownloadString(stringUrl);

      var result = JsonConvert.DeserializeObject<List<Contact>>(json);

 }

The simple code snippet above doesn't probably seem perfect, but it does the job. When the file is large (15,000 contacts - 48 MB file), JsonConvert.DeserializeObject isn't the solution and the line throws an exception type of JsonReaderException.

The downloaded JSON content is an array and this is how a sample looks like. Contact is a container class for the deserialized JSON object.

[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]

My initial guess is it runs out of memory. Just out of curiosity, I tried to parse it as JArray which caused the same exception too.

I have started to dive into Json.NET documentation and read similar threads. As I haven't managed to produce a working solution yet, I decided to post a question here.

UPDATE: While deserializing line by line, I got the same error: " [. Path '', line 600003, position 1." So downloaded two of them and checked them in Notepad++. I noticed that if the array length is more than 12,000, after 12000th element, the "[" is closed and another array starts. In other words, the JSON looks exactly like this:

[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]
[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Yavar Hasanov
  • 513
  • 1
  • 6
  • 12
  • 7
    `and the line throws an exception type of JsonReaderException.` What is the exception message? Any inner exception? – Eser Aug 26 '15 at 13:05
  • Additional text encountered after finished reading JSON content: [. Path '', line 600003, position 1." - this is the exception message – Yavar Hasanov Aug 26 '15 at 13:09
  • 3
    @Yavarski Are you sure you're JSON is valid? – Yuval Itzchakov Aug 26 '15 at 13:09
  • @Yavarski As you can see, it is not related to the size of the json. There are some extra characters at the end of your json.. – Eser Aug 26 '15 at 13:11
  • Are you saying _one input_ is 48MB or you are combining several inputs into one that reaches 48MB? – D Stanley Aug 26 '15 at 13:11
  • There's something wrong with the format . – Christo S. Christov Aug 26 '15 at 13:15
  • Consider using Async. It improves performance for the processes. – Frederick Marcoux Aug 26 '15 at 13:23
  • I am using the third party api which generates a link with the list of contacts(json array). The file I get is a json file and it is constructed as posted above. @YuvalItzchakov , i believe it's valid json because. I have repeated this for 100 different urls and never had an issue. However, json arrays contained less than 10000 contacts in all of them. – Yavar Hasanov Aug 26 '15 at 13:32
  • @DStanley it's a downloadable link. For instance, the current file I work with is like 48MB. What I assume is the reader runs out memory while reading the json and probably it's the middle of json, that's why the exception is thrown with that message. I may be totally wrong but this is what comes to my mind for now. – Yavar Hasanov Aug 26 '15 at 13:32
  • If you think you're running out of memory, you could try processing the JSON incrementally instead of deserializing into one giant list. See [Deserialize json array stream one item at a time](http://stackoverflow.com/q/20374083/10263). – Brian Rogers Aug 26 '15 at 14:03
  • Can you try specifying the encoding to UTF8? There might be some special characters messing with the json format. You can do this by using client.Encoding = Encoding.UTF8; – Andres Castro Aug 26 '15 at 14:12
  • Thanks @BrianRogers it really helped. I am updating the question now. – Yavar Hasanov Aug 26 '15 at 15:36
  • Your source data is two arrays but you are telling it to to deserialize into into a single array (List). Since you're already going line by line you should merge the two arrays. – Cory Charlton Aug 26 '15 at 20:36
  • If it runs out of memory, shouldn't an OutOfMemoryException be thrown? I don't think JSON.NET would be so stupid to catch that kind of exception and return invalid data. – Thomas Weller Aug 26 '15 at 21:03

4 Answers4

56

As you've correctly diagnosed in your update, the issue is that the JSON has a closing ] followed immediately by an opening [ to start the next set. This format makes the JSON invalid when taken as a whole, and that is why Json.NET throws an error.

Fortunately, this problem seems to come up often enough that Json.NET actually has a special setting to deal with it. If you use a JsonTextReader directly to read the JSON, you can set the SupportMultipleContent flag to true, and then use a loop to deserialize each item individually.

This should allow you to process the non-standard JSON successfully and in a memory efficient manner, regardless of how many arrays there are or how many items in each array.

    using (WebClient client = new WebClient())
    using (Stream stream = client.OpenRead(stringUrl))
    using (StreamReader streamReader = new StreamReader(stream))
    using (JsonTextReader reader = new JsonTextReader(streamReader))
    {
        reader.SupportMultipleContent = true;

        var serializer = new JsonSerializer();
        while (reader.Read())
        {
            if (reader.TokenType == JsonToken.StartObject)
            {
                Contact c = serializer.Deserialize<Contact>(reader);
                Console.WriteLine(c.FirstName + " " + c.LastName);
            }
        }
    }

Full demo here: https://dotnetfiddle.net/2TQa8p

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Brian Rogers
  • 125,747
  • 31
  • 299
  • 300
25

Json.NET supports deserializing directly from a stream. Here is a way to deserialize your JSON using a StreamReader reading the JSON string one piece at a time instead of having the entire JSON string loaded into memory.

using (WebClient client = new WebClient())
{
    using (StreamReader sr = new StreamReader(client.OpenRead(stringUrl)))
    {
        using (JsonReader reader = new JsonTextReader(sr))
        {
            JsonSerializer serializer = new JsonSerializer();

            // read the json from a stream
            // json size doesn't matter because only a small piece is read at a time from the HTTP request
            IList<Contact> result = serializer.Deserialize<List<Contact>>(reader);
        }
    }
}

Reference: JSON.NET Performance Tips

Kristian Vukusic
  • 3,284
  • 6
  • 30
  • 46
  • 15
    This code may not load the entire stream into memory, but will certainly load the entire list of contacts into memory. Unless the Contact object throws away large amounts of data from the stream, you've just pushed your memory problem downstream. – John Bledsoe Dec 06 '17 at 16:05
6

I have done a similar thing in Python for the file size of 5 GB. I downloaded the file in some temporary location and read it line by line to form an JSON object similar on how SAX works.

For C# using Json.NET, you can download the file, use a stream reader to read the file, and pass that stream to JsonTextReader and parse it to JObject using JTokens.ReadFrom(your JSonTextReader object).

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
nixdaemon
  • 121
  • 6
  • It makes sense. I will try this and post the updates here.Thanks a mil. – Yavar Hasanov Aug 26 '15 at 13:37
  • Look for "Kristian" answer below. He has the code implementation its pretty similar concept on what i have explained above but i like "Kristian" approach better :) – nixdaemon Aug 26 '15 at 21:13
0

This might still be relevant to some now that the "new" System.Text.Json is out.

await using FileStream file = File.OpenRead("files/data.json");
var options = new JsonSerializerOptions {
    PropertyNamingPolicy = JsonNamingPolicy.CamelCase
};

// Switch the JsonNode type with one of your own if
// you have a specific type you want to deserialize to.
IAsyncEnumerable<JsonNode?> enumerable = JsonSerializer.DeserializeAsyncEnumerable<JsonNode>(file, options);

await foreach (JsonNode? obj in enumerable) {
    var firstname = obj?["firstname"]?.GetValue<string>();
}

If you're interested in more, such as how to parse zipped JSON, there's this blog post that I wrote: Parsing 60GB Json Files using Streams in .NET.

Millard
  • 436
  • 5
  • 11