1

I have an API that currently receives JSON calls that I push to files (800KB-1MB) (1 for each call), and would like to have an hourly task that takes all of the JSON files in the last hour and combines them into a single file as to make it better to do daily/monthly analytics on.

Each file consists of a collection of data, so in the format of [ object {property: value, ... ]. Due to this, I cannot do simple concatenation as it'll no longer be valid JSON (nor add a comma then the file will be a collection of collections). I would like to keep the memory foot-print as low as possible, so I was looking at the following example and just pushing each file to the stream (deserializing the file using JsonConvert.DeserializeObject(fileContent); however, by doing this, I end up with a collection of collection as well. I have also tried using a JArray instead of the JsonConvert, pushing to a list outside of the foreach with but provides the same result. If I move the Serialize call outside the ForEach, it does work; however, I am worried about holding the 4-6GB worth of items in memory.

In summary, I'm ending up with [ [ object {property: value, ... ],... [ object {property: value, ... ]] where my desired output would be [ object {property: value (file1), ... object {property: value (fileN) ].

        using (FileStream fs = File.Open(@"C:\Users\Public\Documents\combined.json", FileMode.CreateNew))
        {
            using (StreamWriter sw = new StreamWriter(fs))
            {
                using (JsonWriter jw = new JsonTextWriter(sw))
                {
                    jw.Formatting = Formatting.None;

                    JArray list = new JArray();
                    JsonSerializer serializer = new JsonSerializer();

                    foreach (IListBlobItem blob in blobContainer.ListBlobs(prefix: "SharePointBlobs/"))
                    {
                        if (blob.GetType() == typeof(CloudBlockBlob))
                        {
                            var blockBlob = (CloudBlockBlob)blob;
                            var content = blockBlob.DownloadText();
                            var deserialized = JArray.Parse(content);
                            //deserialized = JsonConvert.DeserializeObject(content);
                            list.Merge(deserialized);
                            serializer.Serialize(jw, list);
                        }
                        else
                        {
                            Console.WriteLine("Non-Block-Blob: " + blob.StorageUri);
                        }
                    }
                }
            }
        }
Steven Mayer
  • 641
  • 1
  • 6
  • 19

1 Answers1

1

In this situation, to keep your processing and memory footprints low, I think I would just concatenate the files one after the other even though it results in technically invalid JSON. To deserialize the combined file later, you can take advantage of the SupportMultipleContent setting on the JsonTextReader class and process the object collections through a stream as if they were one whole collection. See this answer for an example of how to do this.

Community
  • 1
  • 1
Brian Rogers
  • 125,747
  • 31
  • 299
  • 300