0

I'm trying to parse huge json file to 2d array.

I can parse. But required memory is almost 10times.

My sample.json file has 100,000 rows, each with a different item.

If sample.json is 500MB this code need 5GB.

How can i reduce memory usage?

I use Newtonsoft.Json, .Net6.0

Read from json


        static void Read()
        {
            List<Dictionary<string, string>> rows = new List<Dictionary<string, string>>();
            string path = @"D:\small.json";
           
            using (FileStream fsRead = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
            using (BufferedStream bsRead = new BufferedStream(fsRead))
            using (StreamReader srRead = new StreamReader(bsRead))
            {
                string? line;
                while ((line = srRead.ReadLine()) != null)
                {
                    JObject jsonObject = JObject.Parse(line);
                    MakeRowData(jsonObject, out var row);

                    rows.Add(row);
                }
            }
        }

Make row

        private static  void MakeRowData(JObject jsonData, out Dictionary<string, string> row)
        {
            Dictionary<string, string> output = new Dictionary<string, string>();

            foreach (var item in jsonData)
            {
                int childSize = 0;

                if (item.Value != null)
                {
                    childSize = item.Value.Children().Count();

                    ///if Item has child, explore deep
                    if (childSize > 0)
                    {
                        ExploreChild(item.Value, ref output);
                    }
                    ///or not just add new item
                    else
                    {
                        string str = item.Value.ToString();
                        output[item.Key] = str ?? "";
                    }
                }
            }
            row = output;
        }

        private static void ExploreChild(JToken jToken, ref Dictionary<string, string> row)
        {
            foreach (var item in jToken)
            {
                int childSize = item.Children().Count();

                ///if Item has child, explore deep
                if (childSize > 0)
                {
                    ExploreChild(item,  ref row);
                }
                ///or not just add new item
                else
                {
                    string path = jToken.Path.Replace('[', '(').Replace(']', ')');

                    string str = jToken.First.ToString();

                    row[path] = str?? "";
                }
            }
        }
    

EDIT Add Sample.json

It is set of json strings.

And Fields are not fixed.

Sample.json
{Field1:0,Field2:1,Field2:3}
{Field1:0,Field5:1,Field6:3}
{Field1:0,Field7:1,Field9:3}
{Field1:0,Field13:1,Field50:3,Field57:3}
...

YS R
  • 55
  • 9
  • 2
    *How can i reduce memory usage?* by not storeing whole content of file in memory ... maybe you can stream output to file instead putting it to `rows` but you didn't write what you wona do with results – Selvin Jun 23 '22 at 07:06
  • What do you do with the data after deserializing them? Do you write them to a database? Can you process them row by row instead of reading all of the data into a list? – Markus Jun 23 '22 at 07:16
  • This rows will displayed GUI. Like Datagrid, chart and process ordering, data filter. – YS R Jun 23 '22 at 07:22
  • For such big data, I don't think using JSON is a good idea, an RDB is more applicable. – WAKU Jun 23 '22 at 07:28
  • @YSR divide ur json data into 2 sections from file. ,then retrieve json data into two action call ,after retrieving complete data then merge it. – Mati Ullah Zahir Jun 23 '22 at 08:24
  • 2
    Consider using real classes rather than `JObject`. Also why `BufferedStream`? Can we have a sample of your JSON objects? – Charlieface Jun 23 '22 at 08:36
  • Can you share some sample JSON so we can get an idea of what you are doing? Is it dynamic, or does it have a fixed schema? If the schema is fixed, deserialization will definitely use less memory than a string dictionary. There are already many questions about reading huge JSON files efficiently, from [How to parse huge JSON file as stream in Json.NET?](https://stackoverflow.com/q/43747477) to [Newtonsoft json.net JsonTextReader Garbage Collector intensive](https://stackoverflow.com/q/55812343/3744182). We need to know specific about your JSON and your algorithms to go beyond those answers. – dbc Jun 23 '22 at 18:50
  • @WAKU I agree. But client already did. – YS R Jun 23 '22 at 22:54
  • @Mati Ullah Zahir What are the benefits? – YS R Jun 23 '22 at 22:59
  • Edit to Add Sample.json – YS R Jun 23 '22 at 23:15
  • @YSR - Your "Sample.json" is not well-formed. Firstly the property names are not quoted. (I will assume that is a typo in the question.) Secondly, it actually looks like [newline delimited JSON](http://ndjson.org/). Can you confirm you are actually parsing NDJSON? And do you ever have values that are arrays or nested objects such as `{"Field1" : [{"a" : "a value"}]}` ? – dbc Jun 23 '22 at 23:31
  • @dbc Yes. It is newline delimited json. I didn't know ndjson. And there is arrays too not only integers. – YS R Jun 24 '22 at 00:55

1 Answers1

1

You can try replacing the recursive exploring children with the iterative one. Something like this:

    private static  void MakeRowData(JObject jsonData, out Dictionary<string, string> row)
{
    Dictionary<string, string> output = new Dictionary<string, string>();
    foreach (var item in jsonData)
    {
        if (item.Value != null)
        {
            ///if Item has child, explore deep
            if (item.Value.HasValues)
            {
                var queue = new Queue<JToken>();
                queue.Enqueue(item.Value);
                while (queue.Any())
                {
                    var currItem = queue.Dequeue();
                    if (currItem.HasValues)
                    {
                        foreach(var child in item)
                            queue.Enqueue(child);
                    }
                    else
                    {   
                        // add item without children to row here
                    }
                }
            }
            ///or not just add new item
            else
            {
                string str = item.Value.ToString();
                output[item.Key] = str ?? "";
            }
        }
    }
    row = output;
}

Recursive calls, unless it is a tail recursion, keep the stack of a method they were called from. This can lead to extensive memory usage.

Mykola Tarasyuk
  • 590
  • 4
  • 12