-2

I created small recursive C# app to parse unknown JSON string to find particular Key/Value pair's using Newtonsoft.Json.dll. It works OK on small JSON strings, but takes really long time, if JSON is bigger: 3.5MB JSON file with 15K+ lines takes >3 min to parse. Parsing same file using RegExp takes <1 sec. Is that JsonConvert.DeserializeObject() takes that long ?!

    string json = @"{""origin-of_error"" : ""error_message"",""foo"" : ""bar""}";
    static void GetJsonValue (string json, string findStr = "foo")
    {
        try
        {

            if (Regex.Match(json, @"^\[", RegexOptions.Multiline).Success)
            {
                // JSON string Array []
                var jArr = JsonConvert.DeserializeObject<List<Object>>(json);
                foreach (var jLine in jArr) GetJsonValue(jLine.ToString());
            }
            else
            {
                // JSON string KEY:VALUE
                var jLog = JsonConvert.DeserializeObject<Dictionary<String, Object>>(json);               
                foreach (KeyValuePair<string, object> jEntry in jLog)
                {
                    if (jEntry.Key.ToString() == findStr) Console.WriteLine("MATCH:" +  jEntry.ToString());
                    GetJsonValue(jEntry.Value.ToString());
                }                       
            }
        }
        catch { }
    }
kestasj
  • 49
  • 2
  • 8
  • 2
    You can't compare a Regex match with creating objects, it's an order of magnitude more complicated. Have you a question are is this just a rant in disguise? – Liam Jun 14 '16 at 16:10
  • 1
    Note that you're parsing the JSON many, many times here too. Why not parse it *once* to a `JObject` and then look through that recursively? – Jon Skeet Jun 14 '16 at 16:13
  • That was easiest way. Was not expecting such time consuming. – kestasj Jun 14 '16 at 16:20
  • 1) Have you tried to [profile](https://stackoverflow.com/questions/3927) your app? What did you find? 2) You should not load a 3.5MB text file into a single string, see [Performance Tips: Optimize Memory Usage](http://www.newtonsoft.com/json/help/html/Performance.htm). 3) It looks like you are using huge amounts of intermediate memory for JTokens and strings. Instead, consider adopting the approach of [Parsing large json file in .NET](https://stackoverflow.com/questions/32227436). You will need to test whether the root is an array or object and adapt accordingly. – dbc Jun 14 '16 at 16:46
  • Yes, I learned array versus object in hard way. Thank You, k – kestasj Jun 14 '16 at 16:50

1 Answers1

1

It's not really clear what your problem is since you don't include a sample of your actual JSON, however it appears you are trying to sequentially deserialize the values in large a JSON array, or the values of the key/value pairs in a large JSON object, when stored in a file on disk.

That being said, some recommendations can be made:

  1. Rather than loading the JSON into a large (3.5 MB) string, you should stream directly from the file, as is explained in Performance Tips: Optimize Memory Usage.

  2. Your current approach seems to be to deserialize to a large temporary Dictionary<string, object> or List<object>, then for each value, reserialize to a string, and deserialize that. This will not be performant.

    If the values you are trying to deserialize are complex objects, you can adapt the solution from Parsing large json file in .NET, enhance to handle the fact that your root JSON container might be an array or object.

    Thus, instead, use:

    public static partial class JsonExtensions
    {
        public static IEnumerable<T> DeserializeValues<T>(Stream stream)
        {
            return DeserializeValues<T>(new StreamReader(stream));
        }
    
        public static IEnumerable<T> DeserializeValues<T>(TextReader textReader)
        {
            var serializer = JsonSerializer.CreateDefault();
            var reader = new JsonTextReader(textReader);
            reader.SupportMultipleContent = true;
            while (reader.Read())
            {
                if (reader.TokenType == JsonToken.StartArray)
                {
                    while (reader.Read())
                    {
                        if (reader.TokenType == JsonToken.Comment)
                            continue; // Do nothing
                        else if (reader.TokenType == JsonToken.EndArray)
                            break; // Break from the loop
                        else
                            yield return serializer.Deserialize<T>(reader);
                    }
                }
                else if (reader.TokenType == JsonToken.StartObject)
                {
                    while (reader.Read())
                    {
                        if (reader.TokenType == JsonToken.Comment)
                            continue; // Do nothing
                        else if (reader.TokenType == JsonToken.PropertyName)
                            continue; // Eat the property name
                        else if (reader.TokenType == JsonToken.EndObject)
                            break; // Break from the loop
                        else
                            yield return serializer.Deserialize<T>(reader);
                    }
                }
            }
        }
    }
    
  3. If the values you are trying to deserialize are primitives (i.e. just strings, as is shown in your example), then you should skip deserializing entirely, and read them directly. Deserializing requires creation and processing of a data contract, usually through reflection. Reading directly skips this complexity.

    Thus you could do:

    public static partial class JsonExtensions
    {
        public static bool IsPrimitive(this JsonToken tokenType)
        {
            switch (tokenType)
            {
                case JsonToken.Integer:
                case JsonToken.Float:
                case JsonToken.String:
                case JsonToken.Boolean:
                case JsonToken.Undefined:
                case JsonToken.Null:
                case JsonToken.Date:
                case JsonToken.Bytes:
                    return true;
                default:
                    return false;
            }
        }
    
        public static IEnumerable<string> ReadPrimitives(Stream stream)
        {
            return ReadPrimitives(new StreamReader(stream));
        }
    
        public static IEnumerable<string> ReadPrimitives(TextReader textReader)
        {
            var reader = new JsonTextReader(textReader);
            reader.SupportMultipleContent = true;
            while (reader.Read())
            {
                if (reader.TokenType.IsPrimitive())
                {
                    if (reader.TokenType == JsonToken.String)
                        yield return reader.Value.ToString(); // No need for conversion
                    else
                        yield return (string)JValue.Load(reader); // Convert to string.
                }
            }
        }
    }
    

For both #2 and #3, you would pass the Stream or StreamReader created by opening your file on disk.

dbc
  • 104,963
  • 20
  • 228
  • 340