1

I have a csv file which is too large to completely fit into my laptop's memory (about 10GB). Is there a way to truncate the file such that only the first n entries are saved in a new file? I started by trying

df = pandas.read_csv("path/data.csv").as_matrix()

but this doesn´t work since the memory is too small.

Any help will be appreciated!

Leon

error
  • 2,356
  • 3
  • 23
  • 25
Leon
  • 61
  • 5
  • You can use the chunksize parameter of read_csv to read the file in chunks. This should allow to read in the file in smaller parts at a time. The awnser to this [question](https://stackoverflow.com/questions/25962114/how-to-read-a-6-gb-csv-file-with-pandas) demonstrates it's use – error Jan 29 '18 at 14:27

1 Answers1

4

Use nrows:

df = pandas.read_csv("path/data.csv", nrows=1000)

The nrows docs say:

Number of rows of file to read. Useful for reading pieces of large files

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • thank you, this works! Now I only have the problem that my table contains both numbers and strings, and all the strings are replaced with nan. Do you know a fix for that? – Leon Jan 29 '18 at 14:35