Loading big CSV file with pandas

Question

I am trying to load a csv file (around 250 MB) as dataframe with pandas. In my first try I used the typical read_csv command but I receive an Error memory. I have tried the approach mentioned in Large, persistent DataFrame in pandas using chunks:

x=pd.read_csv('myfile.csv', iterator=True, chunksize=1000)
xx=pd.concat([chunk for chunk in x], ignore_index=True)

but when I tried to concatenate I received the following error: Exception: "All objects passed were None". In fact I can not access the chunks

I am using winpython 3.3.2.1 for 32 bits with pandas 0.11.0

Did you resolve this issue? Did you upgrade to pandas 0.12.0? — smci, Oct 04 '13 at 05:42
Yes, I install last winpython 64 bits version and it worked withmy files. I still have to tested with bigger files — user2082695, Oct 09 '13 at 09:30

score 2 · Accepted Answer · answered Jul 30 '13 at 16:13

2

I suggest that you install the 64 Bit version of winpython. Then you should be able to load a 250 MB file without problems.

answered Jul 30 '13 at 16:13

w-m

10,772
1
42
49

score 0 · Answer 2 · answered Apr 12 '16 at 20:47

I'm late, but the actual problem with the posted code is that using pd.concat([chunk for chunk in x]) effectively cancels any benefit of chunking because it concatenates all those chunks into one big DataFrame again.
That probably even requires twice the memory temporarily.

Loading big CSV file with pandas

2 Answers2

Linked