1

I have a csv file which has a size of around 800MB which I'm trying to load into a dataframe via pandas but I keep getting a memory error. I need to load it so I can join it to another smaller dataframe.

Why am I getting a memory error even though I'm using 64bit versions of Windows, and Python 3.4 64bit and have over 8GB of RAM and plenty of harddisk? Is this is a bug in Pandas? How can I solve this memory issue?

Nickpick
  • 6,163
  • 16
  • 65
  • 116
  • Possible duplicate of [Memory error when using pandas read\_csv](http://stackoverflow.com/questions/17557074/memory-error-when-using-pandas-read-csv) – hashcode55 Jun 15 '16 at 13:36
  • I knew the answer, but I forgot. – piRSquared Jun 15 '16 at 15:44
  • 1
    You already have two questions about this: [here](https://stackoverflow.com/questions/37834904/merge-csv-files-too-large-for-pandas) and [here](https://stackoverflow.com/questions/37756991/best-way-to-join-two-large-datasets-in-pandas) stop reposting – Noelkd Jun 16 '16 at 16:00

1 Answers1

1

reading your CSV in chunks might help:

chunk_size = 10**5
df = pd.concat([chunk for chunk in pd.read_csv(filename, chunksize=chunk_size)],
               ignore_index=False)
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419