0

I have two huge tables as pandas objects that can hardly fit into memory, and I need to combine them into a third one:

df = pd.melt(df, id_vars='index', value_vars=cell_ids, 
             var_name='cell_id', value_name='expr')
df_raw = pd.melt(df_raw, id_vars='index', value_vars=cell_ids, 
             var_name='cell_id', value_name='raw_expr')

df_combined = pd.merge(df, df_raw, on="index")

Is there a way to delete df and df_raw on the fly while creating df_combined, so that I would not get out of memory error while doing merge operation?

This is not duplicate because:

I need to release the memory on the fly. I can not just del on two dataframes because I will not be able to run merge. I can not do del after running merge because out of memory error would already occur. So, I need a way of creating merged table and destroying the input ones at the same time. I thought that maybe there are some packages, software to actually achieve that.

Nikita Vlasenko
  • 4,004
  • 7
  • 47
  • 87
  • Possible duplicate of [How do I release memory used by a pandas dataframe?](https://stackoverflow.com/questions/39100971/how-do-i-release-memory-used-by-a-pandas-dataframe) – pazitos10 Oct 17 '18 at 22:37
  • Nope, it is not because I need to release the memory `on the fly`. I can not just `del` on two dataframes because I will not be able to run `merge`. I can not do `del` after running `merge` because `out of memory` error would already occur. So, I need a way of creating merged table and destroying the input ones at the same time. I thought that maybe there are some packages, software to actually achieve that. – Nikita Vlasenko Oct 17 '18 at 22:39
  • 1
    Oh, maybe I'm wrong. Sorry about that. Have you tried using [dask](http://docs.dask.org/en/latest/dataframe.html) to work with big dataframes? – pazitos10 Oct 17 '18 at 22:46
  • Nope, looking it over now. Thank you! – Nikita Vlasenko Oct 17 '18 at 22:54

1 Answers1

1

I'm not sure if this would work, but maybe you should try it out. Start by separating one of your data-frame in smaller dataframes. So that

df = pd.concat([df1,...,dfn])

then you can merge each of the small dataframe df1,...,dfn with df_raw. After each merge, you can save this dataframe to your disk. Once all the merge are done, you free all your memory, load all of the merged table, and concatenate them.

Tell me if you need technical advice on how to perform this.

Statistic Dean
  • 4,861
  • 7
  • 22
  • 46