I want to read a JSON file that contains ObjectId and ISODate.
JSON DATA :
{
"_id" : ObjectId("5baca841d25ce14b7d3d017c"),
"country" : "in",
"state" : "",
"date" : ISODate("1902-01-31T00:00:00.000Z")
}
I want to read a JSON file that contains ObjectId and ISODate.
JSON DATA :
{
"_id" : ObjectId("5baca841d25ce14b7d3d017c"),
"country" : "in",
"state" : "",
"date" : ISODate("1902-01-31T00:00:00.000Z")
}
I want to expand a little on Maviles' answer by adding a couple of notes from a couple of other SO questions.
First, from «Unable to deserialize PyMongo ObjectId from JSON» we learn that this data looks like the Python representation of an actual BSON/MOngo Extended JSON object. Native BSON files are also binaries, not text.
Second, from «How can I use Python to transform MongoDB's bsondump into JSON?» we can expand on Fabian Fagerholm's answer:
def read_mongoextjson_file(filename):
with open(filename, "r") as f:
# read the entire input; in a real application,
# you would want to read a chunk at a time
bsondata = '['+f.read()+']'
# convert the TenGen JSON to Strict JSON
# here, I just convert the ObjectId and Date structures,
# but it's easy to extend to cover all structures listed at
# http://www.mongodb.org/display/DOCS/Mongo+Extended+JSON
jsondata = re.sub(r'ObjectId\s*\(\s*\"(\S+)\"\s*\)',
r'{"$oid": "\1"}',
bsondata)
jsondata = re.sub(r'ISODate\s*\(\s*(\S+)\s*\)',
r'{"$date": \1}',
jsondata)
jsondata = re.sub(r'NumberInt\s*\(\s*(\S+)\s*\)',
r'{"$numberInt": "\1"}',
jsondata)
# now we can parse this as JSON, and use MongoDB's object_hook
# function to get rich Python data structures inside a dictionary
data = json.loads(jsondata, object_hook=json_util.object_hook)
return data
As you see comparing the previous version and this one it is quite simple to handle the types. Use MongoDB Extended JSON reference for any other you need.
A couple of additional caveats:
bsondata = '['+f.read()+']'
Otherwise I would get a JSONDecodeError: Extra data: line 38 column 2 (char 1016)
at the end of the first object.
json_utils
from bson
, this thread «importing json_utils issues ImportError» helped me, i. e.:pip uninstall bson
pip uninstall pymongo
pip install pymongo
Here's a paste with a complete working example.
I've got the same problem, and the bson package would only help if you already had the data on a dict type.
If you have it already in a dict, you can convert it to a json like this (link):
from bson import json_util
import json
resulting_json = json.dumps(your_dict, default=json_util.default)
If you have it as a string, bson will not help. So I just removed the objects and made a strict json string and converted to a dict:
import json
import re
#This will outputs a iterator that converts each file line into a dict.
def readBsonFile(filename):
with open(filename, "r") as data_in:
for line in data_in:
# convert the TenGen JSON to Strict JSON
jsondata = re.sub(r'\:\s*\S+\s*\(\s*(\S+)\s*\)',
r':\1',
line)
# parse as JSON
line_out = json.loads(jsondata)
yield line_out
A file with this:
{ "_id" : ObjectId("5baca841d25ce14b7d3d017c"), "country" : "in", "state" : "", "date" : ISODate("1902-01-31T00:00:00.000Z")}
will output this dict:
{ "_id" : "5baca841d25ce14b7d3d017c",
"country" : "in",
"state" : "",
"date" : "1902-01-31T00:00:00.000Z"}
This is called "BSON" format and you can install bsondump
from MongoDB and use that app to convert the file into "Extended JSON" format, which can be parsed by JSON libraries.
bsondump < input.bson > output.json