I have an external data file that I am reading in for processing compressed in a gz file (hosted in S3). It contains json objects (1 per line and millions of lines per file) as mocked up data example below
{"a":"v1","b":"v2"}
{"a":"v3","b":"v2"}
I am using the following code to process this
JsonSerializer serializer = new JsonSerializer();
using (GZipStream decompressionStream = new GZipStream(data, CompressionMode.Decompress))
{
using (StreamReader sr = new StreamReader(decompressionStream))
{
using (var reader = new JsonTextReader(sr))
{
while(reader.Read())
{
if(reader.TokenType == JsonToken.StartObject)
{
var o = serializer.Deserialize<DataObject>(reader);
}
}
}
}
}
The DataObject is just a POCO data object to map the data into. First iteration works perfectly however I get an exception on the second execution on reader.Read().
Additional text encountered after finished reading JSON content
I think this could be due to the linefeed at the end of each json object but not sure how to resolve.
Any help would be very much appreciated