2

While I was trying to load large (~5 GB) Postgres table to pandas with pandas.read_sql("SELECT * FROM mytbl",conn) I got memory error and the task failed. What is the proper way to do this?

I could achieve the wanted result by copying the data to a local CSV file first (see Export Postgres table to CSV file with headings) then loading it to pandas with pandas.read_csv. Thanks!

Community
  • 1
  • 1
hovnatan
  • 1,331
  • 10
  • 23
  • What do you want to do with the data in pandas? Do you want to aggregate it, as it does not fit in memory? Or another operation? Will loading it with `read_csv` not give a MemoryError as well? – joris Jan 28 '15 at 16:07
  • Yes, some sort of aggregation. It does fit in the memory, I have 64 Gb RAM server machine – hovnatan Jan 28 '15 at 16:08
  • You are running 64-bit python and not 32-bit? – EdChum Jan 28 '15 at 16:11
  • Maybe you can have a look at the `chunksize` argument of `read_sql`. See example here: http://pandas.pydata.org/pandas-docs/stable/io.html#querying. Not sure this will solve it, but that will do the query in chunks, and you can aggregate or merge them in pandas – joris Jan 28 '15 at 16:12

0 Answers0