0

As described in this post, How to Import Data in .bson File, I have a .bson file that I would like to somehow load into Stata.

The best case scenario would be to create a .csv file, but converting it to a .json file would also be great. Then I think I can use insheetjson in Stata.

I am familiar with python and found this post, MongoDB: BSON to JSON. The answer says that one can use the simplejson package and this code to convert bson to json:

result = db.mycol.find({ ....})
json = simplejson.dumps(result)

How can I get this to work? I don't exactly know how to load the bson file into python (which I think is what the db object is). I also don't know what should go in the parentheses ({ ....}). Any suggestions? Again, another simple method to get the .bson data into a .csv or .json would also be welcome.

***Update

Taking the comment into advisement, I have done the following:

with open("filepath/games.bson", "r") as myfile:
    data = myfile.read()

#note that we need to change to unicode because of errors with some characters
data2 = unicode(data, errors='ignore')

with open('filepath/games.json', 'w') as data_file:
    json.dump(data2, data_file)

But in both data and data2, I get a result that looks like:

\x00\x02fg_pct\x00\x05\x00\x00\x00.273\x00\x10fga\x00\x0b\x00\x00\x00\x10ft\x00\x01\x00\x00\x00\x02ft_pct\x00\x05\x00\x00\x00.500\x00\x10fta\x00\x02\x00\x00\x00\x02mp\x00\x06\x00\x00\x0021:00\x00\x10orb\x00\x01\x00\x00\x00\x10pf\x00\x02\x00\x00\x00\x02player\x00\x0b\x00\x00\x00Juan Dixon\x00\x10plus_minus\x00\xee\xff\xff\xff\x10pts\x00\x08\x00\x00\x00\x10stl\x00\x01\x00\x00\x00\x10tov\x00\x01\x00\x00\x00\x10trb\x00\x02\x00\x00\x00\x00\x037\x00\xf7\x00\x00\x00\x10ast\x00\x02\x00\x00\x00\x10blk\x00\x00\x00\x00\x00\x10drb\x00\x02\x00\x00\x00\x10fg\x00\x00\x00\x00\x00\x10fg3\x00\x00\x00\x00\x00\x02fg3_pct\x00\x05\x00\x00\x00.000\x00\x10fg3a\x00\x03\x00\x00\x00\x02fg_pct\x00\x05\x00\x00\x00.000\x00\x10fga\x00\x05\x00\x00\x00\x10ft\x00\x02\x00\x00\x00\x02ft_pct\x00\x05\x00\x00\x00.500\x00\x10fta\x00\x04\x00\x00\x00\x02mp\x00\x06\x00\x00\x0020:00\x00\x10orb\x00\x00\x00\x00\x00\x10pf\x00\x02\x00\x00\x00\x02player

This doesn't seem to be what I want. If it is, I am not sure how to open it in Stata or another program.

Community
  • 1
  • 1
bill999
  • 2,147
  • 8
  • 51
  • 103
  • In your example, I'm pretty sure that `result` would be a pure Python dict/list. If that's the case, you could simply use Python's standard `json` lib to do `json.dumps(result)` to have a JSON valid string or `json.dump(result)` to save your JSON to a file. Did I misunderstand something? – lucasnadalutti Nov 15 '16 at 19:40
  • I have tried to do this and have updated the question with the results. – bill999 Nov 15 '16 at 22:34
  • Try replacing `with open("filepath/games.bson", "r") as myfile:` with `with open("filepath/games.bson", "rb") as myfile:` – lucasnadalutti Nov 16 '16 at 00:29
  • I just tried that and unfortunately still get all the `\x00` and whatnot. – bill999 Nov 16 '16 at 00:33

1 Answers1

0

Try this:

import bson
with open('filepath/games.bson','rb') as f:
    data = bson.decode_all(f.read())
Axel Juraske
  • 186
  • 4