2

I finnally managed to join two big DataFrames on a big machine of my school (512G memory). At the moment we re two people using the same machine, the other one is using about 120G of the memory, after I called the garbage collecter we get to 420G.

I want to save the DataFrame to memory so I then I can reuse it easily and move it on another machine, I have tried to export it to a parquet file, but I get a memory error...

So how can I manage to dump that Dataframe on the hard drive in purpose of reusing it without running into memory error when memory is already near full ?

Thank you

2 Answers2

0

There are several options. You can pickle the dataframe or you can use hdf5 format. These will occupy less memory. Also when you load it next time, it would be quicker then other formats.

Vikika
  • 318
  • 1
  • 9
0

I'm not sure how it would perform with a dataset that large but you can use the pandas function to_csv to save the file to the hard drive.

df.to_csv("filename.csv")

If you're going to be working with that much data in the future I might suggest a chunking approach like the one mentioned here: https://stackoverflow.com/a/25962187/4852976

UBears
  • 371
  • 1
  • 4
  • 18