I'm trying to split very large JSON files into smaller files for a given array. For example:
{
"headerName1": "headerVal1",
"headerName2": "headerVal2",
"headerName3": [{
"element1Name1": "element1Value1"
},
{
"element2Name1": "element2Value1"
},
{
"element3Name1": "element3Value1"
},
{
"element4Name1": "element4Value1"
},
{
"element5Name1": "element5Value1"
},
{
"element6Name1": "element6Value1"
}]
}
...down to { "elementNName1": "elementNValue1" } where N is a large number
The user provides the name which represents the array to be split (in this example "headerName3") and the number of array objects per file, e.g. 1,000,000
This would result in N files each containing the top name:value pairs (headerName1, headerName3) and up to 1,000,000 of the headerName3 objects in each file.
I'm using the excellent Newtonsof JSON.net and understand that I need to do this using a stream.
So far I have looked a reading in JToken objects to establish where the PropertyName == "headerName3" occurs when reading in the tokens but what I would like to do is then read in the entire JSON object for each object in the array and not have to continue parsing JSON into JTokens;
Here's a snippet of the code I am building so far:
using (StreamReader oSR = File.OpenText(strInput))
{
using (var reader = new JsonTextReader(oSR))
{
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
intObjectCount++;
}
else if (reader.TokenType == JsonToken.EndObject)
{
intObjectCount--;
if (intObjectCount == 1)
{
intArrayRecordCount++;
// Here I want to read the entire object for this record into an untyped JSON object
if( intArrayRecordCount % 1000000 == 0)
{
//write these to the split file
}
}
}
}
}
}
I don't know - and in fact, and am not concerned with - the structure of the JSON itself, and the objects can be of varying structures within the array. I am therefore not serializing to classes.
Is this the right approach? Is there a set of methods in the JSON.net library I can easily use to perform such operation?
Any help appreciated.