Is there a limit to the amount of rows Pandas read_csv can load?

Question

I am trying to load a .csv file using Pandas read_csv method, the file has 29872046 rows and it's total size is 2.2G. I notice that most of the lines loaded miss their values, for a large amount of columns. The csv file when browsed from shell contains those values... Are there any limitations to loaded files? If not, how could this be debugged? Thanks

There's no limit in theory. Are you using less than 8Gb of ram? — Andy Hayden, Jun 24 '13 at 09:33
For cases like this try the iterator: http://pandas.pydata.org/pandas-docs/dev/io.html#iterating-through-files-chunk-by-chunk — Jeff, Jun 24 '13 at 11:48

score 5 · Accepted Answer · edited May 23 '17 at 12:04

@d1337,

I wonder if you have memory issues. There is a hint of this here.

Possibly this is relevant or this.

If I was attempting to debug it I would do the simple thing. Cut the file in half - what happens? If ok, go up 50%, if not down 50%, until able to identify the point where its happening. You might even want to start with 20 lines and just make sure it is size related.

I'd also add OS and memory information plus the version of Pandas you're using to your post in case its relevant (I'm running Pandas 11.0, Python 3.2, Linux Mint x64 with 16G of RAM so I'd expect no issues, say). Also, possibly, you might post a link to your data so that someone else can test it.

Hope that helps.

Is there a limit to the amount of rows Pandas read_csv can load?

1 Answers1