0

I am trying to load csv files in pandas dataframe. However, Python is taking very large amount of memory while loading the files. For example, the size of csv file is 289 MB but the memory usage goes to around 1700 MB while I am trying to load the file. And at that point, the system shows memory error. I have also tried chunk size but the problem persists. Can anyone please show me a way forward?

Suman
  • 59
  • 1
  • 11

3 Answers3

0

OK, first things first, do not confuse disk size and memory size. A csv, in it's core is a plain text file, whereas a pandas dataframe is a complex object loaded in memory. That said, I can't give a statement about your particular case, considering that I don't know what you have in your csv. So instead I'll give you an example with a csv on my computer that has a similar size:

-rw-rw-r--  1 alex users 341M Jan 12  2017 cpromo_2017_01_12_rec.csv

Now reading the CSV:

>>> import pandas as pd
>>> df = pd.read_csv('cpromo_2017_01_12_rec.csv')
>>> sys:1: DtypeWarning: Columns (9) have mixed types. Specify dtype option on import or set low_memory=False.
>>> df.memory_usage(deep=True).sum() / 1024**2
1474.4243307113647

Pandas will attempt to optimize it as much as it can, but it won't be able to do the impossible. If you are low on memory, this answer is a good place to start. Alternatively you could try dask but I think that's too much work for a small csv.

Alexander Ejbekov
  • 5,594
  • 1
  • 26
  • 26
  • I have figured out the reason. If I use skipfooter, memory usage shoots to 1.7 GB. Otherwise it's around 600 MB. – Suman Mar 19 '18 at 12:12
0

You can use the library "dask"
e.g:

# Dataframes implement the Pandas API
import dask.dataframe as dd`<br>
df = dd.read_csv('s3://.../2018-*-*.csv')
0

try like this - 1) load with dask and then 2) convert to pandas

import pandas as pd
import dask.dataframe as dd
import time
t=time.clock()
df_train = dd.read_csv('../data/train.csv', usecols=[col1, col2])
df_train=df_train.compute()
print("load train: " , time.clock()-t)
Yury Wallet
  • 1,474
  • 1
  • 13
  • 24