I have (potentially large) json files being uploaded that need to be written out somewhere else. I would like to do at least some basic validation (for example, make sure they are valid JSON - maybe even apply a schema) but I'd like to avoid having to load the entire (again, potentially large) file into memory and then have to write it out again. I'm using JSON.Net and thought I could do something like this:
using (var sr = new StreamReader(source))
using (var jsonReader = new JsonTextReader(sr))
using (var textWriter = new StreamWriter(myoutputStream))
using (var outputStream = new JsonTextWriter(textWriter))
{
while (jsonReader.Read())
{
// TODO: any addition validation!
outputStream.WriteToken(jsonReader);
}
}
With the idea being that the reader would walk the JSON file as it comes in and write it out as it processes each token. If there is a mistake in the input, it'll throw an exception which I can handle by returning an error message to the user.
The problem is that if I step through this code using a JSON file that consists of a single object with an array property which has a collection of more objects (the whole file is about 1.3k lines formatted), I expected it to step through. Instead it seems like it just reads in the entire object and spits it back out again in one step.
Is there a way to handle large JSON objects from a steam, make sure they really are valid JSON and stream them back out without having to have to hold the entire object in memory at once).
Although the answer might be more general, the data I'm currently attempting to handle is GeoJson data. A (very short) example looks like this:
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [125.6, 10.1]
},
"properties": {
"name": "Dinagat Islands"
}
}
A much longer example might be:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"name": "Van Dorn Street",
"marker-color": "#0000ff",
"marker-symbol": "rail-metro",
"line": "blue"
},
"geometry": {
"type": "Point",
"coordinates": [
-77.12911152370515,
38.79930767201779
]
}
},...//lots more objects
]
}
The suggestion from here: https://www.newtonsoft.com/json/help/html/ReadingWritingJSON.htm
Is that it should read individual tokens StartObject
, PropertyName
, etc...