Read and write large CSV file in python

Question

I use the following code to read a LARGE CSV file (6-10 GB), insert a header text, and then export it to CSV a again.

df = read_csv('read file')
df.columns =['list of headers']
df.to_csv('outfile',index=False,quoting=csv.QUOTE_NONNUMERIC)

But this methodology is extremely slow and I run out of memory. Any suggestions?

This could help https://stackoverflow.com/questions/25962114/how-to-read-a-6-gb-csv-file-with-pandas — Mohit Motwani, Nov 22 '18 at 13:58
@MohitMotwani I am not sure how to modify my code to implement those suggestions — Chris, Nov 22 '18 at 14:05
You can also read line-by-line and thus might be able to discard/modify some things on the fly. Otherwise, you might want to look at Dask. — Martin Thoma, Nov 22 '18 at 14:25
Wait ... do you simply want to insert the first line, ignoring the content? So you could do `f.seek(0, 0);f.write(headers_string)`? — Martin Thoma, Nov 22 '18 at 14:27

score 0 · Answer 1 · answered Nov 22 '18 at 14:40

Rather than reading in the whole 6GB file, could you not just add the headers to a new file, and then cat in the rest? Something like this:

import fileinput

columns = ['list of headers']
columns.to_csv('outfile.csv',index=False,quoting=csv.QUOTE_NONNUMERIC)
with FileInput(files=('infile.csv')) as f:
    for line in f:
        outfile.write(line)
    outfile.close()

Read and write large CSV file in python

1 Answers1