I have dictionary of dataframes that I want to combine. Ideally, I would do this from memory like this:
values = ['A','B','C']
dats = [dataset[x] for x in values] # get list of dataframes from the dictionary of dataframes "dataset" (causes kernel crash)
dataset_df = pd.concat(dats, sort=False, join='outer', ignore_index=True) # concat datasets
However, this causes a kernel crash, so I have to resort to pickling the dictionary first and retrieving the dataframes one by one, which is a real performance hog:
dats = [get_dataset(x) for x in values] # get_dataset() retrieves one dataframe from disk
dataset_df = pd.concat(dats, sort=False, join='outer', ignore_index=True) # concat datasets
The combined dataset fits in memory alongside the individual dataset. I have confirmed this by adding it to the dictionary of dataframes afterwards. So why the kernel crash?
Does putting dataframes from a dict into a list somehow cause excessive memory usage?