I'm looking to work on project involving the venmo dataset. I was able to torrent the bson file and it's sitting in my desktop, but I don't know what to do with it. I'm not too familar with MongoDB and i'm looking to turn it into a pandas dataframe for analysis. Anyone know any tips on doing so?
Asked
Active
Viewed 4,369 times
1 Answers
5
Find below an Python example how to read a bson file:
import pandas as pd
import bson
FILE="/folder/file.bson"
with open(FILE,'rb') as f:
data = bson.decode_all(f.read())
main_df=pd.DataFrame(data)
main_df.describe()

Alexey Vazhnov
- 1,291
- 17
- 20

Leandro Gonçalves
- 109
- 1
- 4
-
2This is working when you use `import bson` of `pip install pymongo`, Mind that it is not working with the `import bson` of `pip install bson`. If you happen to have both installed, `pip install pymongo`'s `import bson` dominates that of `pip install bson`, but then you can also use `pip uninstall bson` anyway. If you ever need both packages, use `pip install pybson` and then `from pybson import bson as ...` instead, alternative name according to https://github.com/py-bson/bson/issues/70 – questionto42 Jun 20 '20 at 11:27
-
The current answer uses pymongo. Does anyone know how to do the same thing with the normal bson package (= pybson)? I only got a 1-row-df with the following code borrowed from https://stackoverflow.com/questions/27527982/read-bson-file-in-python: `b = open(mongodbbsonfilename, 'rb').read()` `bs = bson.loads(b)` `data = bson.decode_binary_subtype( bs, 2 )` `df = pd.DataFrame.from_dict(pd.json_normalize(data), orient='columns')` When I change read() to readfiles(), it is no BSON String anymore, but a list. – questionto42 Jun 20 '20 at 11:55
-
Follow up to the previous comment. The error: `TypeError: a bytes-like object is required, not 'list'`. I tried converting that list (previous comment) to BSON String using a similar approach as in this JSON problem: Then I tried a similar JSON approach, without success: https://pythonpedia.com/en/knowledge-base/48614158/read-json-file-as-pandas-dataframe- – questionto42 Jun 20 '20 at 12:47