39

I serialize an array of large objects to a json http response stream. Now I want to deserialize these objects from the stream one at a time. Are there any c# libraries that will let me do this? I've looked at json.net but it seems I'd have to deserialize the complete array of objects at once.

[{large json object},{large json object}.....]

Clarification: I want to read one json object from the stream at a time and deserialize it.

ZNS
  • 840
  • 1
  • 9
  • 14

5 Answers5

61

In order to read the JSON incrementally, you'll need to use a JsonTextReader in combination with a StreamReader. But, you don't necessarily have to read all the JSON manually from the reader. You should be able to leverage the Linq-To-JSON API to load each large object from the reader so that you can work with it more easily.

For a simple example, say I had a JSON file that looked like this:

[
  {
    "name": "foo",
    "id": 1
  },
  {
    "name": "bar",
    "id": 2
  },
  {
    "name": "baz",
    "id": 3
  }
]

Code to read it incrementally from the file might look something like the following. (In your case you would replace the FileStream with your response stream.)

using (FileStream fs = new FileStream(@"C:\temp\data.json", FileMode.Open, FileAccess.Read))
using (StreamReader sr = new StreamReader(fs))
using (JsonTextReader reader = new JsonTextReader(sr))
{
    while (reader.Read())
    {
        if (reader.TokenType == JsonToken.StartObject)
        {
            // Load each object from the stream and do something with it
            JObject obj = JObject.Load(reader);
            Console.WriteLine(obj["id"] + " - " + obj["name"]);
        }
    }
}

Output of the above would look like this:

1 - foo
2 - bar
3 - baz
Brian Rogers
  • 125,747
  • 31
  • 299
  • 300
  • 3
    This is essentially the same solution I came up with. Except I do a new JsonSerializer().Deserialize(reader); instead of JObject.Load. I'm not exactly sure how JsonTextReader manages the stream data though. – ZNS Dec 04 '13 at 22:34
  • You can always peruse the [source code](http://json.codeplex.com/SourceControl/latest#trunk/Src/Newtonsoft.Json/JsonTextReader.cs) to find out. – Brian Rogers Dec 06 '13 at 00:40
  • this got me going in the right direction, but i had to do `JsonSerializer.Create().Deserialize(reader, desiredType);` since `JObject.Load` never worked (nor threw an error -- it was most bizarre). – Brad Jan 03 '15 at 03:16
  • 1
    Remember, using JObject.Load is not performant compared to serializer.Deserialize. So always use Deserialize – Mohamed Mansour May 03 '18 at 20:13
4

I have simplified one of the samples/tests of my parser/deserializer to answer this question's use case more straightforwardly.

Here's for the test data:

https://github.com/ysharplanguage/FastJsonParser/tree/master/JsonTest/TestData

(cf. fathers.json.txt)

And here's for the sample code:

    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using System.Text;

    // Our stuff
    using System.Text.Json;

//...

    public class FathersData
    {
        public Father[] fathers { get; set; }
    }

    public class Someone
    {
        public string name { get; set; }
    }

    public class Father : Someone
    {
        public int id { get; set; }
        public bool married { get; set; }
        // Lists...
        public List<Son> sons { get; set; }
        // ... or arrays for collections, that's fine:
        public Daughter[] daughters { get; set; }
    }

    public class Child : Someone
    {
        public int age { get; set; }
    }

    public class Son : Child
    {
    }

    public class Daughter : Child
    {
        public string maidenName { get; set; }
    }

//...

    static void FilteredFatherStreamTestSimplified()
    {
        // Get our parser:
        var parser = new JsonParser();

        // (Note this will be invoked thanks to the "filters" dictionary below)
        Func<object, object> filteredFatherStreamCallback = obj =>
        {
            Father father = (obj as Father);
            // Output only the individual fathers that the filters decided to keep (i.e., when obj.Type equals typeof(Father)),
            // but don't output (even once) the resulting array (i.e., when obj.Type equals typeof(Father[])):
            if (father != null)
            {
                Console.WriteLine("\t\tId : {0}\t\tName : {1}", father.id, father.name);
            }
            // Do not project the filtered data in any specific way otherwise,
            // just return it deserialized as-is:
            return obj;
        };

        // Prepare our filter, and thus:
        // 1) we want only the last five (5) fathers (array index in the resulting "Father[]" >= 29,995),
        // (assuming we somehow have prior knowledge that the total count is 30,000)
        // and for each of them,
        // 2) we're interested in deserializing them with only their "id" and "name" properties
        var filters = 
            new Dictionary<Type, Func<Type, object, object, int, Func<object, object>>>
            {
                // We don't care about anything but these 2 properties:
                {
                    typeof(Father), // Note the type
                    (type, obj, key, index) =>
                        ((key as string) == "id" || (key as string) == "name") ?
                        filteredFatherStreamCallback :
                        JsonParser.Skip
                },
                // We want to pick only the last 5 fathers from the source:
                {
                    typeof(Father[]), // Note the type
                    (type, obj, key, index) =>
                        (index >= 29995) ?
                        filteredFatherStreamCallback :
                        JsonParser.Skip
                }
            };

        // Read, parse, and deserialize fathers.json.txt in a streamed fashion,
        // and using the above filters, along with the callback we've set up:
        using (var reader = new System.IO.StreamReader(FATHERS_TEST_FILE_PATH))
        {
            FathersData data = parser.Parse<FathersData>(reader, filters);

            System.Diagnostics.Debug.Assert
            (
                (data != null) &&
                (data.fathers != null) &&
                (data.fathers.Length == 5)
            );
            foreach (var i in Enumerable.Range(29995, 5))
                System.Diagnostics.Debug.Assert
                (
                    (data.fathers[i - 29995].id == i) &&
                    !String.IsNullOrEmpty(data.fathers[i - 29995].name)
                );
        }
        Console.ReadKey();
    }

The rest of the bits is available here:

https://github.com/ysharplanguage/FastJsonParser

'HTH,

YSharp
  • 1,066
  • 9
  • 8
0

This is my solution (combined from different sources, but mainly based on Brian Rogers solution) to convert huge JSON file (which is an array of objects) to XML file for any generic object.

JSON looks like this:

   {
      "Order": [
          { order object 1},
          { order object 2},
          {...}
          { order object 10000},
      ]
   }

Output XML:

<Order>...</Order>
<Order>...</Order>
<Order>...</Order>

C# code:

XmlWriterSettings xws = new XmlWriterSettings { OmitXmlDeclaration = true };
using (StreamWriter sw = new StreamWriter(xmlFile))
using (FileStream fs = new FileStream(jsonFile, FileMode.Open, FileAccess.Read))
using (StreamReader sr = new StreamReader(fs))
using (JsonTextReader reader = new JsonTextReader(sr))
{
    //sw.Write("<root>");
    while (reader.Read())
    {
        if (reader.TokenType == JsonToken.StartArray)
        {
            while (reader.Read())
            {
                if (reader.TokenType == JsonToken.StartObject)
                {
                    JObject obj = JObject.Load(reader);
                    XmlDocument doc = JsonConvert.DeserializeXmlNode(obj.ToString(), "Order");
                    sw.Write(doc.InnerXml); // a line of XML code <Order>...</Order>
                    sw.Write("\n");
                    //this approach produces not strictly valid XML document
                    //add root element at the beginning and at the end to make it valid XML                                
                }
            }
        }
    }
    //sw.Write("</root>");
}
Community
  • 1
  • 1
serop
  • 1,138
  • 12
  • 13
0

With Cinchoo ETL - an open source library, you can parse large JSON efficiently with low memory footprint. Since the objects are constructed and returned in a stream based pull model

using (var p = new ChoJSONReader(** YOUR JSON FILE **))
{
            foreach (var rec in p)
            {
                Console.WriteLine($"Name: {rec.name}, Id: {rec.id}");
            }
}

For more information, please visit codeproject article.

Hope it helps.

Cinchoo
  • 6,088
  • 2
  • 19
  • 34
0

I know that the question is old, but it appears in google search, and I needed the same thing recently. The another way to deal with stream serilization is to use JsonSerializer.DeserializeAsyncEnumerable

Usage looks like:

await using (var readStream = File.Open(filePath, FileMode.Open, FileAccess.Read, FileShare.Read))
{
    await foreach (T item in JsonSerializer.DeserializeAsyncEnumerable<T>(readStream))
    {                        
        // do something withe the item
    }
}
Daniil
  • 96
  • 8