13

I am trying to parse the JSON incrementally, i.e. based on a condition.

Below is my json message and I am currently using JavaScriptSerializer to deserialize the message.

string json = @"{"id":2,
"method":"add",
"params":
   {"object":
       {"name":"test"
        "id":"1"},
        "position":"1"}
  }";

JavaScriptSerializer js = new JavaScriptSerializer();
Message m = js.Deserialize<Message>(json);

Message class is shown below:

 public class Message
 {
        public string id { get; set; }
        public string method { get; set; }
        public Params @params { get; set; }
        public string position { get; set; }
 }
public class Params
{
        public string name { get; set; }
        public string id{ get; set; 
}

The above code parses the message with no problems. But it parses the entire JSON at once. I want it to proceed parsing only if the "method" parameter's value is "add". If it is not "add", then I don't want it to proceed to parse rest of the message. Is there a way to do incremental parsing based on a condition in C#? (Environment: VS 2008 with .Net 3.5)

tshepang
  • 12,111
  • 21
  • 91
  • 136
user591410
  • 3,051
  • 5
  • 21
  • 30

6 Answers6

16

I have to admit I'm not as familiar with the JavaScriptSerializer, but if you're open to use JSON.net, it has a JsonReader that acts much like a DataReader.

using(var jsonReader = new JsonTextReader(myTextReader)){
  while(jsonReader.Read()){
    //evaluate the current node and whether it's the name you want
    if(jsonReader.TokenType.PropertyName=="add"){
      //do what you want
    } else {
      //break out of loop.
    }
  }
}
Rob Streeting
  • 1,675
  • 3
  • 16
  • 27
David Hoerster
  • 28,421
  • 8
  • 67
  • 102
11

Here are the generic and simple methods I use to parse, load and create very large JSON files. The code uses now pretty much standard JSON.Net library. Unfortunately the documentation isn't very clear on how to do this but it's not very hard to figure it out either.

Below code assumes the scenario where you have large number of objects that you want to serialize as JSON array and vice versa. We want to support very large files whoes size is only limited by your storage device (not memory). So when serializing, the method takes IEnumerable<T> and while deserializing it returns the same. This way you can process the entire file without being limited by the memory.

I've used this code on file sizes of several GBs with reasonable performance.

//Serialize sequence of objects as JSON array in to a specified file
public static void SerializeSequenceToJson<T>(this IEnumerable<T> sequence, string fileName)
{
    using (var fileStream = File.CreateText(fileName))
        SerializeSequenceToJson(sequence, fileStream);
}

//Deserialize specified file in to IEnumerable assuming it has array of JSON objects
public static IEnumerable<T> DeserializeSequenceFromJson<T>(string fileName)
{
    using (var fileStream = File.OpenText(fileName))
        foreach (var responseJson in DeserializeSequenceFromJson<T>(fileStream))
            yield return responseJson;
}

//Utility methods to operate on streams instead of file
public static void SerializeSequenceToJson<T>(this IEnumerable<T> sequence, TextWriter writeStream, Action<T, long> progress = null)
{
    using (var writer = new JsonTextWriter(writeStream))
    {
        var serializer = new JsonSerializer();
        writer.WriteStartArray();
        long index = 0;
        foreach (var item in sequence)
        {
            if (progress != null)
                progress(item, index++);

            serializer.Serialize(writer, item);
        }
        writer.WriteEnd();
    }
}
public static IEnumerable<T> DeserializeSequenceFromJson<T>(TextReader readerStream)
{
    using (var reader = new JsonTextReader(readerStream))
    {
        var serializer = new JsonSerializer();
        if (!reader.Read() || reader.TokenType != JsonToken.StartArray)
            throw new Exception("Expected start of array in the deserialized json string");

        while (reader.Read())
        {
            if (reader.TokenType == JsonToken.EndArray) break;
            var item = serializer.Deserialize<T>(reader);
            yield return item;
        }
    }
}
Shital Shah
  • 63,284
  • 17
  • 238
  • 185
6

If you take a look at Json.NET, it provides a non-caching, forward-only JSON parser that will suit your needs.

See the JsonReader and JsonTextReader class in the documentation.

svick
  • 236,525
  • 50
  • 385
  • 514
Kevin McCormick
  • 2,358
  • 20
  • 20
2

I'm currently in hour 3 of an unknown timespan, watching 160GB of JSON get deserialized into class objects. My memory use has been hanging tight at ~350MB, and when I inspect memory objects it's all stuff the GC can take care. Here's what I did:

    FileStream fs = File.Open("F:\\Data\\mysuperbig150GB.json", FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
    StreamReader sr = new StreamReader(fs);

    using (JsonReader reader = new JsonTextReader(sr))
    {
        JsonSerializer serializer = new JsonSerializer();

        MyJsonToClass result = serializer.Deserialize<MyJsonToClass>(reader);
    }

The problem is the deserialization. That 160GB of data is way bigger than what my PC can handle at once.

  1. I used a small snippet (which is tough, even just opening a 160GB file) and got a class structure via jsontochsarp.

  2. I made a specific class for the big collection in the auto-generated-via-json-tool class structure, and subclassed System.Collection.ObjectModel.ObservableCollection instead of List. They both implement IEnumerable, which I think is all the Newtsonsoft JSON deserializer cares about.

  3. I went in and overrode InsertItem, like this:

     protected override void InsertItem(int index, Feature item)
     {
       //do something with the item that just got deserialized
       //stick it in a database, etc.
       RemoveItem(0);
     }
    

Again, my problems where partially about JSON deserialization speed but beyond that I couldn't fit ~160GB of JSON data into collection. Even tightened up, it would be in the dozens of gigs area, way bigger than what .net is going to be happy with.

InsertItem on ObservableCollection is the only method I'm aware of that you can handle when deserialization occurs. List.Add() doesn't. I know this solution isn't "elegant", but it's working as I type this.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Eric
  • 2,273
  • 2
  • 29
  • 44
1

You'd be wanting a SAX-type parser for JSON

http://en.wikipedia.org/wiki/Simple_API_for_XML

http://www.saxproject.org/event.html

SAX raises an event as it parses each piece of the document.

Doing something like that in JSON would (should) be pretty simple, given how simple the JSON syntax is.

This question might be of help: Is there a streaming API for JSON?

And another link: https://www.p6r.com/articles/2008/05/22/a-sax-like-parser-for-json/

Community
  • 1
  • 1
Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
0

What's the reason for this approach? If you concern is performance then it's likely "premature optimization", or in other words, worrying about a problem that might not exist.

I would strongly urge that you don't worry about this detail. Build your application, and then if it isn't fast enough use profiling tools to locate the actual bottlenecks--they likely won't be where you expect.

Focusing on performance before knowing it's an issue almost always leads to lost time, and excessive code.

STW
  • 44,917
  • 17
  • 105
  • 161
  • 3
    Definitely a valid point, and I have a feeling that you're right, but it doesn't answer the OP's question. – Kevin McCormick Jan 26 '12 at 22:57
  • Thanks for your inputs. You're right but the json message that was illustrated in my question is not the actual message, it looks like this(shown below). So I want to parse only that is needed based on condition. `{"id":2, "method":"add", "params": {"object": {"name":"test", "key2": "value2"...."key100":"value:100"}, "position":"1"} }"; – user591410 Jan 26 '12 at 22:58
  • @user591410, if you parse an unneeded message fully, how many nano seconds do you loose? – L.B Jan 26 '12 at 23:40
  • 1
    @STW These kinds of answers does not help the community. Let's say you are importing a few hundred gigabytes of data from a JSON file. I'd hate to kill the CLR garbage collector by first deserializing the entire JSON into POCOs before processing... – Jack Wester Feb 11 '12 at 11:39
  • 1
    How do you propose parsing streamed JSON (like https://dev.twitter.com/docs/streaming-apis for instance) without incremental parsing? – Yaur Jun 08 '13 at 16:01