What can cause slowness in loading a large file to RAM using Pandas?

Question

I am using python 2.7 and Pandas to load a bit large csv file (~10G) using Pandas 'read_csv' method. This action used to take 3-4 minutes until today, and suddenly it started taking hours without completing. The machine has 30G RAM and multiple CPUs, I checked the usage and nearly all of the memory and CPUs are free. Also the process's status is 'D' for most of the time (linux machine) which I read that usually happens during a wait for an I/O?

How can I debug this to find what's causing the problem?

Thank you

What has changed: the file? your pandas install? (As a side note, if you find yourself reading in the same csv often consider using pickle or HDF5Store). — Andy Hayden, Jul 10 '13 at 11:42
Nothing has changed, that was the first thing I verified. And I tried pickling the data but for that I didn't have sufficient RAM even when things used to work. Is it possible that another user's action on the server are influencing this? if so how can I verify it? — d1337, Jul 10 '13 at 11:48
stupid question, but did you try a reboot if nothing has changed? — Joop, Jul 10 '13 at 11:50
(Definitely recommend [pytables/HDF5Store](http://pandas.pydata.org/pandas-docs/stable/io.html#hdf5-pytables).) — Andy Hayden, Jul 10 '13 at 11:55
can you print df.info? (if you have from a prior run), or about what your frame looks like (shape & dtypes)? — Jeff, Jul 10 '13 at 12:56
will try HDF5Store, although it might be a problem to install since I am not the admin of the machine, and that's also why I can't reboot it... Searching for other run logs for the shape & dtypes — d1337, Jul 10 '13 at 13:19
you can install a virtualenv: http://stackoverflow.com/questions/5844869/comprehensive-beginners-virtualenv-tutorial — Jeff, Jul 10 '13 at 13:34

What can cause slowness in loading a large file to RAM using Pandas?

0 Answers0