4

I am creating a json deserializer. I am deserializing a pretty big json file (25mb), which contains a lot of information. It is an array for words, with a lot of duplicates. With NewtonSoft.Json, I can deserialize the input as stream:

using (var fs = new FileStream(@"myfile.json", FileMode.Open, FileAccess.Read))
using (var sr = new StreamReader(fs))
using (var reader = new JsonTextReader(sr))
{
    while (reader.Read())
    {
        //Read untill I find the narrow subset I need and start parsing and analyzing them directly
        var obj = JObject.Load(reader); //Analyze this object
    }
}

This allows me to keep reading small parts of the json and analyze it and check for duplicates etc.

If I want to do the same with ServiceStack.Text. I am doing something like:

using (var fs = new FileStream(@"myfile.json", FileMode.Open, FileAccess.Read))
using (var sr = new StreamReader(fs))
{
    var result = ServiceStack.Text.JsonSerializer.DeserializeFromReader<MyObject>(sr);
}

MyObject only contains the subset of the json I am interested in, but this creates a massive overhead, as I will get a big array that contains a lot of duplicates.

In the first method I can filter these away immediately and thus not keeping them in memory.

The memory footprint between the two are (this includes the console program overhead):

  • NewtonSoft: 30mb
  • ServiceStack.Text: 215mb

And the time is:

  • NewtonSoft: 2.5s
  • ServiceStack.Text: 1.5s

The memory footprint is quite important, as I will be processing a lot of these.

I do understand that the ServiceStack method will give me the security of TypeSafety, but the memory footprint is more important for me.

As I can see that ServiceStack.Text is a lot faster, so I would like to know if I am able to recreate NewtonSoft example, but with ServiceStack.Text?

Edit (Added the object I try to parse):

public class MyObject
{
    public List<List<Word>> Words { get; set; }
}

public class Word
{
    public string B { get; set; }
    public string W { get; set; }
    public string E { get; set; }
    public string P { get; set; }
}

In my test file (which is representative of use case) it has 29000 words, but only around 8500 unique words. I am only analyzing this data, so I cannot change the structure of it. It is a file containing arrays of arrays of words.

Kristian Barrett
  • 3,574
  • 2
  • 26
  • 40
  • Can you give an idea of what `MyObject` looks like? Is the data you are processing getting stored in some sort of collection? If so, you might be able to do something with a custom collection. – dbc Jan 10 '17 at 19:03
  • @dbc I have added the MyObject class to the question now. – Kristian Barrett Jan 10 '17 at 19:07
  • From the [source code](https://github.com/ServiceStack/ServiceStack.Text/blob/master/src/ServiceStack.Text/JsonSerializer.Generic.cs#L41) it looks like `JsonSerializer.DeserializeFromReader(TextReader reader)` just loads the entire JSON into a string: `return DeserializeFromString(reader.ReadToEnd());`. And internally the various [parse methds](https://github.com/ServiceStack/ServiceStack.Text/blob/master/src/ServiceStack.Text/Common/DeserializeListWithElements.cs#L120) work with strings. So, I don't *think* so. – dbc Jan 10 '17 at 19:26
  • With Json.NET, you might try deserializing each individual object to a `Word` using the technique from [this answer](http://stackoverflow.com/a/32237819/3744182). [This page](http://stackify.com/top-11-json-performance-usage-tips/) claims LINQ-to-JSON is 20% slower than deserialization. See [here](http://stackoverflow.com/q/26380184/3744182) for more ideas. – dbc Jan 10 '17 at 19:38
  • @dbc that is how I am doing it with Json.NET. I was just wondering if I could do the same with ServiceStack.Text. I am running this analysis in AWS Lambda, so the more efficient I can make it, the less I will have to pay for it. – Kristian Barrett Jan 10 '17 at 19:41
  • @KristianBarrett I had no problem with your approach, var result = ServiceStack.Text.JsonSerializer.DeserializeFromReader(sr); using simple class , just FYI – Brian Ogden Feb 05 '18 at 23:50

0 Answers0