1

I use the following code to read a LARGE CSV file (6-10 GB), insert a header text, and then export it to CSV a again.

df = read_csv('read file')
df.columns =['list of headers']
df.to_csv('outfile',index=False,quoting=csv.QUOTE_NONNUMERIC)

But this methodology is extremely slow and I run out of memory. Any suggestions?

Chris
  • 433
  • 4
  • 17
  • This could help https://stackoverflow.com/questions/25962114/how-to-read-a-6-gb-csv-file-with-pandas – Mohit Motwani Nov 22 '18 at 13:58
  • @MohitMotwani I am not sure how to modify my code to implement those suggestions – Chris Nov 22 '18 at 14:05
  • You can also read line-by-line and thus might be able to discard/modify some things on the fly. Otherwise, you might want to look at Dask. – Martin Thoma Nov 22 '18 at 14:25
  • Wait ... do you simply want to insert the first line, ignoring the content? So you could do `f.seek(0, 0);f.write(headers_string)`? – Martin Thoma Nov 22 '18 at 14:27

1 Answers1

0

Rather than reading in the whole 6GB file, could you not just add the headers to a new file, and then cat in the rest? Something like this:

import fileinput

columns = ['list of headers']
columns.to_csv('outfile.csv',index=False,quoting=csv.QUOTE_NONNUMERIC)
with FileInput(files=('infile.csv')) as f:
    for line in f:
        outfile.write(line)
    outfile.close()
Benny Mac
  • 46
  • 4