5

I received a few JSON data files- however, it has the BSON datatypes included in each object; on top of that, its a really large tojson dump (millions of records).

I am trying to deserialize the data and as expected it fails.

The JSON file has things like:

"someKey" : NumberLong("1234567889"),

It also has ISODate in there...

Is there a way to handle this with Json.net? Seems like there is probably some setting to have it use a custom function rather than the built in parser for specific keys?

*Updated to include code for the stream+textreader for the very large (100GB+ file)

using (StreamReader file = File.OpenText(@"\\largedump.txt"))
            using (JsonTextReader reader = new JsonTextReader(file))
            {
                reader.SupportMultipleContent = true;    
                var serializer = new JsonSerializer();
                while (reader.Read())
                {
                    if (reader.TokenType == JsonToken.StartObject)
                    {
                        Contacts c = serializer.Deserialize<Contacts>(reader);
                        Console.WriteLine(c.orgId);
                    }
                }
            }
profesor79
  • 9,213
  • 3
  • 31
  • 52
zxed
  • 336
  • 2
  • 12
  • `{"someKey" : NumberLong("1234567889")}` is not valid JSON. See the [JSON Standard](http://www.json.org/). That being said, Json.NET supports some extensions to the standard, including [constructors](https://stackoverflow.com/questions/36958680). If you could preprocess your JSON to `{"someKey" : new NumberLong("1234567889")}` you could then parse it with Json.NET – dbc Aug 11 '16 at 17:46
  • Yup - its invalid because the DBA's didnt dump it with strict; which would have been better as it would have represented it in json with '$numberlong', which I had already coded for... and now have to adjust. Seems like id have to capture the string from the reader to pre-process it... – zxed Aug 11 '16 at 18:04
  • I think you may need to use some sort of Regex to insert the `new` before the `NumberLong` (or just remove it entirely), streaming the result to a temp file. – dbc Aug 11 '16 at 18:09
  • Would you say that: either insert the new... or just remove the NumberLong(" and its ending ")... would yield the same result? :) – zxed Aug 11 '16 at 18:23
  • Removing NumberLong and its ending would make it possible to deserialize `someKey` directly into a `string`, `long` or `BigInteger` without needing a custom `JsonConverter` - but it probably requires a more complex regex. Adding the `new` will require a converter to deserialize but the regex could be very simple. So, go with whatever is easiest for you. (For me writing a converter would be easier.) – dbc Aug 11 '16 at 18:27
  • 1
    doesnt seem like there is a simple enough way to dump the stream to file as you have to rely on the jsontextreader to find the end of the collection object - which itself throws an error because of the invalid json.... I ended up just pre-streaming the entire file and scrubbing out the junk... – zxed Aug 11 '16 at 21:08

1 Answers1

2

You can use mongo driver bson serializer:

using MongoDB.Bson.Serialization;

  var bjson = @"{
                        '_id' : ObjectId('57ac672e34780e59784d7d2a'),
                        'ActivePick' : null,
                        'EventCodeId' : null,
                        'Frame' : { '$binary' : 'AgY=', '$type' : '00' },
                        'FrameTimeStamp' : ISODate('2016-08-11T11:53:18.541Z'),
                        'ServerUserId' : 0,
                        'ServerUserName' : null,
                        'SesionId' : 0,
                        'TraderId' : null,
                        'TraderName' : null
                    }";

        var bsonDocument = BsonDocument.Parse(bjson);
        var myObj = BsonSerializer.Deserialize<FrameDocument>(bsonDocument);

source here

EDIT

I had no issues with given approach. Please see github solution as it is serializing without issues.

            string line;
            using (TextReader file = File.OpenText("ImportDataFromBJsonFile\\a.json"))
            {
                while ((line = file.ReadLine()) != null)
                {
                    var bsonDocument = BsonDocument.Parse(line);
                    var myObj = BsonSerializer.Deserialize<Zxed>(bsonDocument);
                }
            }

source (sln project)

profesor79
  • 9,213
  • 3
  • 31
  • 52
  • as we need to have a string on input - which is provided by streamRedar - then I see no issues here. – profesor79 Aug 13 '16 at 22:52
  • I dont think its as simple as that - 32 GB export file; 100,000,000 records; you cant use a simple streamreader, it needs to be jsontextreader so that it can find the start and end of each json token (record). Could you kindly update your solution if you know of an alternate to jsontextreader that does the same? – zxed Aug 17 '16 at 16:54
  • @zxed could you check my edit and post a sample of your data? – profesor79 Aug 17 '16 at 22:32
  • is the assumption that each json record is per line? – zxed Aug 18 '16 at 13:50
  • the data is not 1 per line - if it was, this wouldn't be an issue as we wouldn't need jsontextreader :) for some reason the printjson was executed with printjson instead of printjsononeline – zxed Aug 18 '16 at 14:02