I just wrote a csv
file using pandas'
to_csv
function. I can see that the size of this file on disk is 13GB. I want to read this file back into a pandas
dataframe
using pd.read_csv
. While reading this file in, I monitor memory usage of the server. Turns out that the memory consumed reading this file in, is 30GB+ and the file is never read in. The kernel of my jupyter notebook
dies and I have to start the process once again.
My question is that why is such a behaviour happening? It's a very simple piece of code to write and read the file, so why are the space requirements different? And finally, how do I read this file in?