1

I have approx 50,000 .pkl files, which each contain two pandas, which I want to append to two large pandas.

I tried to loop over the files, reading them in, and appending one by one which gets painfully slow (why? see here):

DF_a = pd.DataFrame
DF_b = pd.DataFrame

for appended_file in os.listdir(folderwithallfiles):
     with open(appenddirectory + appended_file, 'rb') as data:
        df_a, df_b = pickle.load(data)
     DF_a= pd.concat([DF_a, df_a], axis = 0]
     DF_b= pd.concat([DF_b, df_b], axis = 0)

As suggested in the linked post, I am trying to build a list of pandas to concatenate, but the only way I can think of doing it would be to rename the dataframes in the loop (like here), which is advised against. I do not see how I can fit them in a dictionary and concat from there. Any advice?

kevins_1
  • 1,268
  • 2
  • 9
  • 27
safex
  • 2,398
  • 17
  • 40

1 Answers1

0

This works:

DF_a = pd.concat([pd.read_pickle(appenddirectory+filename)[0] for filename in appendedfiles])
DF_b = pd.concat([pd.read_pickle(appenddirectory+filename)[1] for filename in appendedfiles])

since pd.read_pickle reads a list of pandas if multiple pandas are in the pkl file

safex
  • 2,398
  • 17
  • 40