Truncating a large csv-file in Python

Question

I have a csv file which is too large to completely fit into my laptop's memory (about 10GB). Is there a way to truncate the file such that only the first n entries are saved in a new file? I started by trying

df = pandas.read_csv("path/data.csv").as_matrix()

but this doesn´t work since the memory is too small.

Any help will be appreciated!

Leon

You can use the chunksize parameter of read_csv to read the file in chunks. This should allow to read in the file in smaller parts at a time. The awnser to this [question](https://stackoverflow.com/questions/25962114/how-to-read-a-6-gb-csv-file-with-pandas) demonstrates it's use — error, Jan 29 '18 at 14:27

score 4 · Answer 1 · answered Jan 29 '18 at 14:31

4

Use nrows:

df = pandas.read_csv("path/data.csv", nrows=1000)

The nrows docs say:

Number of rows of file to read. Useful for reading pieces of large files

answered Jan 29 '18 at 14:31

John Zwinck

239,568
38
324
436

thank you, this works! Now I only have the problem that my table contains both numbers and strings, and all the strings are replaced with nan. Do you know a fix for that? – Leon Jan 29 '18 at 14:35

Truncating a large csv-file in Python

1 Answers1