python pandas memory errors while working with big CSV files

Question

I'm having memory problems while using Pandas on some big CSV files (more than 30 million rows). So, I'm wondering what is the best solution for this? I need to merge couple big tables. Thanks a lot!

what is the size of the csv file and what is the size of your RAM?. Did you try properties like `low_memory=False` and `chunksize` while reading the data? — Kathirmani Sukumar, May 12 '16 at 05:33

score 0 · Answer 1 · edited May 23 '17 at 11:59

0

Possible duplicate of Fastest way to parse large CSV files in Pandas.

The inference is, if you are loading the csv file data often, then a better way would be to parse it once (with conventional read_csv) and store it in HDF5 format. Pandas (with PyTables library), provides an efficient way to handle this issue [docs].

Also, the answer to What is the fastest way to upload a big csv file in notebook to work with python pandas? shows you the timed execution (timeit) of sample dataset with csv vs csv.gz vs Pickle vs HDF5 comparison.

edited May 23 '17 at 11:59

Community

1
1

answered May 12 '16 at 05:44

Sameer Mirji

2,135
16
28

The problem is not in uploading the file. The problem is merging couple big tables. – physics_2015 May 12 '16 at 06:41
Your question is slightly misleading in that case. Although, `HDF5` format still works best for your requirement. Ref [this](http://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas) for more clarity. – Sameer Mirji May 12 '16 at 07:00

python pandas memory errors while working with big CSV files

1 Answers1