I have a large json file (around 16Gb) with the following structure:
{
"Job": {
"Keys": {
"JobID": "test123",
"DeviceID": "TEST01"
},
"Props": {
"FileType": "Measurements",
"InstrumentDescriptions": [
{
"InstrumentID": "1723007",
"InstrumentType": "Actual1",
"Name": "U",
"DataType": "Double",
"Units": "degC"
},
{
"InstrumentID": "2424009",
"InstrumentType": "Actual2",
"Name": "VG03",
"DataType": "Double",
"Units": "Pa"
}
]
},
"Steps": [
{
"Keys": {
"StepID": "START",
"StepResult": "NormalEnd"
},
"InstrumentData": [
{
"Keys": {
"InstrumentID": "1723007"
},
"Measurements": [
{
"DateTime": "2021-11-16 21:18:37.000",
"Value": 540
},
{
"DateTime": "2021-11-16 21:18:37.100",
"Value": 539
},
{
"DateTime": "2021-11-16 21:18:37.200",
"Value": 540
},
{
"DateTime": "2021-11-16 21:18:37.300",
"Value": 540
},
{
"DateTime": "2021-11-16 21:18:37.400",
"Value": 540
},
{
"DateTime": "2021-11-16 21:18:37.500",
"Value": 540
},
{
"DateTime": "2021-11-16 21:18:37.600",
"Value": 540
},
{
"DateTime": "2021-11-16 21:18:37.700",
"Value": 538
},
{
"DateTime": "2021-11-16 21:18:37.800",
"Value": 540
}
]
},
{
"Keys": {
"InstrumentID": "2424009"
},
"Measurements": [
{
"DateTime": "2021-11-16 21:18:37.000",
"Value": 1333.22
},
{
"DateTime": "2021-11-16 21:18:37.100",
"Value": 1333.22
},
{
"DateTime": "2021-11-16 21:18:37.200",
"Value": 1333.22
},
{
"DateTime": "2021-11-16 21:18:37.300",
"Value": 1333.22
},
{
"DateTime": "2021-11-16 21:18:37.400",
"Value": 1333.22
},
{
"DateTime": "2021-11-16 21:18:37.500",
"Value": 1333.22
},
{
"DateTime": "2021-11-16 21:18:37.600",
"Value": 1333.22
},
{
"DateTime": "2021-11-16 21:18:37.700",
"Value": 1333.22
},
{
"DateTime": "2021-11-16 21:18:37.800",
"Value": 1333.22
}
]
}
]
}
]
}
}
The problem
I would like to split this file into multiple files by splitting the array "InstrumentData" because this array will be holding the major chunk of the data. Splitting this file into smaller files would enable me to parse the file without getting an out of memory exception.
Current State
public static void SplitJson(string filename, string arrayPropertyName)
{
string templateFileName = @"C:\Temp\template.json";
string arrayFileName = @"C:\Temp\array.json";
CreateEmptyFile(templateFileName);
CreateEmptyFile(arrayFileName);
using (Stream stream = File.OpenRead(filename))
using (JsonReader reader = new JsonTextReader(new StreamReader(stream)))
using (JsonWriter templateWriter = new JsonTextWriter(new StreamWriter(templateFileName)))
using (JsonWriter arrayWriter = new JsonTextWriter(new StreamWriter(arrayFileName)))
{
if (reader.Read() && reader.TokenType == JsonToken.StartObject)
{
templateWriter.WriteStartObject();
while (reader.Read() && reader.TokenType != JsonToken.EndObject)
{
string propertyName = (string)reader.Value;
reader.Read();
templateWriter.WritePropertyName(propertyName);
if (propertyName == arrayPropertyName)
{
arrayWriter.WriteToken(reader);
templateWriter.WriteStartObject(); // empty placeholder object
templateWriter.WriteEndObject();
}
else if (reader.TokenType == JsonToken.StartObject ||
reader.TokenType == JsonToken.StartArray)
{
templateWriter.WriteToken(reader);
}
else
{
templateWriter.WriteValue(reader.Value);
}
}
templateWriter.WriteEndObject();
}
}
// Now read the huge array file and combine each item in the array
// with the template to make new files
JObject template = JObject.Parse(File.ReadAllText(templateFileName));
using (JsonReader arrayReader = new JsonTextReader(new StreamReader(arrayFileName)))
{
int counter = 0;
while (arrayReader.Read())
{
if (arrayReader.TokenType == JsonToken.StartObject)
{
counter++;
JObject item = JObject.Load(arrayReader);
template[arrayPropertyName] = item;
string fileName = string.Format(@"C:\Temp\output_{0}_{1}_{2}.json",
template["name"], template["age"], counter);
File.WriteAllText(fileName, template.ToString());
}
}
}
// Clean up temporary files
File.Delete(templateFileName);
File.Delete(arrayFileName);
}
I am using this method to try and split the file into smaller files. However, this method can only split the files based on properties which are in the root level.
The question
Am I in the right track to tackle this problem? Is this an efficient way to tackle this? How do I split the JSON into multiple files by splitting the array in an efficient way? The JSON file should be split in a way that there is one file for each of the element in "InstrumentData" array. All the other properties and structures should be retained in the splitted files.