-1

So I have a folder with 30 something .bson files and i can access them one by one by using this code:

    path_to_bson = 'C:/Documents/dump2020/strat_db/xyz.bson'
    data=[]
    with open(path_to_bson,'rb') as f:
        data=bson.decode_all(f.read())
    xyz=pd.DataFrame(data)

I tried accessing all the files together but dont know how to get further:

   path_to_bson = 'C:/Documents/dump2020/strat_db'
   bson_files=[pos_bson for pos_bson in os.listdir(path_to_bson) if pos_bson.endswith('.bson')]        
   data=[]
   for bs in bson_files:
       with open(//what should be here?//,'rb') as f:
            //what should be here?//

I want the dataframe's name to be same as the .bson file. So for instance, xyz.bson should be saved as a dataframe named xyz and so on.

nandi1596
  • 59
  • 7

1 Answers1

0

You need to open and decode the bson files, then convert them to a pandas df.

Using the answer in this post (BSON file to pandas dataframe) I think you need something like:

import pandas as pd
import bson
import os

path_to_bson = 'C:/Documents/dump2020/strat_db'
bson_files=[pos_bson for pos_bson in os.listdir(path_to_bson) if pos_bson.endswith('.bson')]        

for bs in bson_files:
     fileloc = path_to_bson + "\\" + bs
     with open(fileloc,'rb') as f:
           data = bson.decode_all(f.read())

     main_df = pd.DataFrame(data)

     saveloc = os.path.splitext(fileloc)[0] + ".pkl"
     main_df.to_pickle(saveloc)
ushham
  • 185
  • 1
  • 7
  • main_df should be the name of the .bson file ( bs in bson_files). I dont know how to do that. Also this code is just returning the last .bson file as a dataframe – nandi1596 Jan 14 '21 at 12:38
  • I had the save line indented incorrectly in the `with` . The line `saveloc = os.path.splitext(fileloc)[0] + ".pkl"` should be taking the name of the opened .bson file – ushham Jan 14 '21 at 12:50