10

I am trying to read in a table from my Postgres database into Python. Table has around 8 million rows and 17 columns, and has a size of 622MB in the DB.

I can export the entire table to csv using psql, and then use pd.read_csv() to read it in. It works perfectly fine. Python process only uses around 1GB of memory and everything is good.

Now, the task we need to do requires this pull to be automated, so I thought I could read the table in using pd.read_sql_table() directly from the DB. Using the following code

import sqlalchemy
engine = sqlalchemy.create_engine("postgresql://username:password@hostname:5432/db")
the_frame = pd.read_sql_table(table_name='table_name', con=engine,schema='schemaname') 

This approach starts using a lot of memory. When I track the memory usage using Task Manager, I can see the Python process memory usage climb and climb, until it hits all the way up to 16GB and freezes the computer.

Any ideas on why this might be happening is appreciated.

user4505419
  • 331
  • 1
  • 4
  • 12

1 Answers1

5

You need to set the chunksize argument so that pandas will iterate over smaller chunks of data. See this post: https://stackoverflow.com/a/31839639/3707607

Community
  • 1
  • 1
Ted Petrou
  • 59,042
  • 19
  • 131
  • 136
  • 8
    Interesting. This works for sure, ill mark the question as answered. I am still unsure why bringing in the entire dataframe from the database would require 10x more memory than reading it from a csv though. – user4505419 Dec 21 '16 at 17:13
  • Note: pandas can still potentially bring in the entire huge dataset from the database even though chunksize is used. see this comment: https://stackoverflow.com/questions/18107953/how-to-create-a-large-pandas-dataframe-from-an-sql-query-without-running-out-of#comment69051587_29522443 – Gabe Sep 25 '20 at 00:52
  • chunksize still loads all the data in memory, stream_results=True is the answer. it is server side cursor that loads the rows in given chunks and save memory.. efficiently using in many pipelines, it may also help when you load history data – Owais Ajaz Jul 29 '22 at 06:15