As described in this post, How to Import Data in .bson File, I have a .bson file that I would like to somehow load into Stata.
The best case scenario would be to create a .csv file, but converting it to a .json file would also be great. Then I think I can use insheetjson
in Stata.
I am familiar with python and found this post, MongoDB: BSON to JSON. The answer says that one can use the simplejson
package and this code to convert bson to json:
result = db.mycol.find({ ....})
json = simplejson.dumps(result)
How can I get this to work? I don't exactly know how to load the bson file into python (which I think is what the db
object is). I also don't know what should go in the parentheses ({ ....})
. Any suggestions? Again, another simple method to get the .bson data into a .csv or .json would also be welcome.
***Update
Taking the comment into advisement, I have done the following:
with open("filepath/games.bson", "r") as myfile:
data = myfile.read()
#note that we need to change to unicode because of errors with some characters
data2 = unicode(data, errors='ignore')
with open('filepath/games.json', 'w') as data_file:
json.dump(data2, data_file)
But in both data
and data2
, I get a result that looks like:
\x00\x02fg_pct\x00\x05\x00\x00\x00.273\x00\x10fga\x00\x0b\x00\x00\x00\x10ft\x00\x01\x00\x00\x00\x02ft_pct\x00\x05\x00\x00\x00.500\x00\x10fta\x00\x02\x00\x00\x00\x02mp\x00\x06\x00\x00\x0021:00\x00\x10orb\x00\x01\x00\x00\x00\x10pf\x00\x02\x00\x00\x00\x02player\x00\x0b\x00\x00\x00Juan Dixon\x00\x10plus_minus\x00\xee\xff\xff\xff\x10pts\x00\x08\x00\x00\x00\x10stl\x00\x01\x00\x00\x00\x10tov\x00\x01\x00\x00\x00\x10trb\x00\x02\x00\x00\x00\x00\x037\x00\xf7\x00\x00\x00\x10ast\x00\x02\x00\x00\x00\x10blk\x00\x00\x00\x00\x00\x10drb\x00\x02\x00\x00\x00\x10fg\x00\x00\x00\x00\x00\x10fg3\x00\x00\x00\x00\x00\x02fg3_pct\x00\x05\x00\x00\x00.000\x00\x10fg3a\x00\x03\x00\x00\x00\x02fg_pct\x00\x05\x00\x00\x00.000\x00\x10fga\x00\x05\x00\x00\x00\x10ft\x00\x02\x00\x00\x00\x02ft_pct\x00\x05\x00\x00\x00.500\x00\x10fta\x00\x04\x00\x00\x00\x02mp\x00\x06\x00\x00\x0020:00\x00\x10orb\x00\x00\x00\x00\x00\x10pf\x00\x02\x00\x00\x00\x02player
This doesn't seem to be what I want. If it is, I am not sure how to open it in Stata or another program.