7

I want to read a JSON file that contains ObjectId and ISODate.

JSON DATA :

{
    "_id" : ObjectId("5baca841d25ce14b7d3d017c"),
    "country" : "in",
    "state" : "",
    "date" : ISODate("1902-01-31T00:00:00.000Z")
}
benvc
  • 14,448
  • 4
  • 33
  • 54
Rochit Jain
  • 71
  • 1
  • 2
  • 5
    That's not valid JSON. – Mike Scotty Oct 05 '18 at 20:03
  • First off, we need a little more information. What have you tried? Do you have any code to show us? Please consider a [mcve]. Also, as @MikeScotty pointed out, that is not valid JSON. – artomason Oct 05 '18 at 20:07
  • Please format your code. More info on this here: https://meta.stackexchange.com/help/formatting – return Oct 05 '18 at 20:10
  • @artomason i got that file from mongodb, but need to process it – Rochit Jain Oct 06 '18 at 07:21
  • Unfortunately, the JSON library included with Python needs your JSON to actually be valid to work. You can check out https://docs.python.org/3.7/library/json.html and https://stackoverflow.com/questions/2835559/parsing-values-from-a-json-file should help you get started. You can use https://jsonlint.com/ to validate your JSON data. – artomason Oct 06 '18 at 12:53
  • Possible duplicate of [Parsing values from a JSON file?](https://stackoverflow.com/questions/2835559/parsing-values-from-a-json-file) – artomason Oct 06 '18 at 12:55

3 Answers3

6

I want to expand a little on Maviles' answer by adding a couple of notes from a couple of other SO questions.

First, from «Unable to deserialize PyMongo ObjectId from JSON» we learn that this data looks like the Python representation of an actual BSON/MOngo Extended JSON object. Native BSON files are also binaries, not text.

Second, from «How can I use Python to transform MongoDB's bsondump into JSON?» we can expand on Fabian Fagerholm's answer:

def read_mongoextjson_file(filename):
    with open(filename, "r") as f:
        # read the entire input; in a real application,
        # you would want to read a chunk at a time
        bsondata = '['+f.read()+']'

        # convert the TenGen JSON to Strict JSON
        # here, I just convert the ObjectId and Date structures,
        # but it's easy to extend to cover all structures listed at
        # http://www.mongodb.org/display/DOCS/Mongo+Extended+JSON
        jsondata = re.sub(r'ObjectId\s*\(\s*\"(\S+)\"\s*\)',
                          r'{"$oid": "\1"}',
                          bsondata)
        jsondata = re.sub(r'ISODate\s*\(\s*(\S+)\s*\)',
                          r'{"$date": \1}',
                          jsondata)
        jsondata = re.sub(r'NumberInt\s*\(\s*(\S+)\s*\)',
                          r'{"$numberInt": "\1"}',
                          jsondata)

        # now we can parse this as JSON, and use MongoDB's object_hook
        # function to get rich Python data structures inside a dictionary
        data = json.loads(jsondata, object_hook=json_util.object_hook)

        return data

As you see comparing the previous version and this one it is quite simple to handle the types. Use MongoDB Extended JSON reference for any other you need.

A couple of additional caveats:

  • the file I was working on was a series of objects, but it wasn't a list, I have worked around by putting everything in square brackets:
   bsondata = '['+f.read()+']'

Otherwise I would get a JSONDecodeError: Extra data: line 38 column 2 (char 1016) at the end of the first object.

pip uninstall bson
pip uninstall pymongo
pip install pymongo

Here's a paste with a complete working example.

CristianCantoro
  • 722
  • 1
  • 7
  • 17
1

I've got the same problem, and the bson package would only help if you already had the data on a dict type.

If you have it already in a dict, you can convert it to a json like this (link):

from bson import json_util
import json

resulting_json = json.dumps(your_dict, default=json_util.default)

If you have it as a string, bson will not help. So I just removed the objects and made a strict json string and converted to a dict:

import json
import re

#This will outputs a iterator that converts each file line into a dict.
def readBsonFile(filename):
    with open(filename, "r") as data_in:
        for line in data_in:
            # convert the TenGen JSON to Strict JSON
            jsondata = re.sub(r'\:\s*\S+\s*\(\s*(\S+)\s*\)',
                              r':\1',
                              line)

            # parse as JSON
            line_out = json.loads(jsondata)

            yield line_out

A file with this:

{ "_id" : ObjectId("5baca841d25ce14b7d3d017c"), "country" : "in", "state" : "", "date" : ISODate("1902-01-31T00:00:00.000Z")}

will output this dict:

     {  "_id" : "5baca841d25ce14b7d3d017c",
        "country" : "in",
        "state" : "",
        "date" : "1902-01-31T00:00:00.000Z"}
Maviles
  • 3,209
  • 2
  • 25
  • 39
0

This is called "BSON" format and you can install bsondump from MongoDB and use that app to convert the file into "Extended JSON" format, which can be parsed by JSON libraries.

bsondump < input.bson > output.json
Philluminati
  • 2,649
  • 2
  • 25
  • 32