How to read a large csv and write it again using a Dataframe in Pandas?

Asked Aug 02 '16 at 12:49

Active Aug 02 '16 at 13:07

Viewed 643 times

I'm trying to read specific columns in a large csv file (>1GB), add a couple of new columns and then write it again.

When I try in the conventional way, the process runs out of memory:

cols = ['Event Time', 'User ID', 'Advertiser ID', 'Ad ID', 'Rendering ID',
    'Creative Version', 'Placement ID', 'Country Code',
    'Browser/Platform ID', 'Browser/Platform Version', 'Operating System ID']
df.insert(7, 'Creative Size ID', '')
df.insert(3, 'Buy ID', '')
df = pd.read_csv(file_name, sep=',', error_bad_lines=False, usecols=cols)
df.to_csv(file_name, sep=',', encoding='utf-8', index=False)

Is there a way to do this process more efficient?

I've used chunk iterator=True, chunksize=1000 but then when you want to write the csv you need to have all your data in memory unless df.to_csv can write by chunks. Is it possible?

edited Aug 02 '16 at 13:07

asked Aug 02 '16 at 12:49

ultraInstinct

4,063
10
36
53

1

check this answer http://stackoverflow.com/questions/25962114/how-to-read-a-6-gb-csv-file-with-pandas – Alex L Aug 02 '16 at 12:54
when you want to write the file in memory using df.to_csv, has also the possibility to write it by chunks? – ultraInstinct Aug 02 '16 at 13:04
1

Yes there is a `chunksize` param for [`to_csv`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html#pandas.DataFrame.to_csv) and also for `read_csv` – EdChum Aug 02 '16 at 13:07

How to read a large csv and write it again using a Dataframe in Pandas?

0 Answers0