Join pandas dataframe with large Postgres table?

Question

I've got a pandas dataframe that's about 12,000 rows, and a Postgres table that's about 8 million rows.

I'd like to run a left join on the pandas dataframe against the Postgres table.

Three ways I can think of doing this:

Load the full Postgres table into a pandas dataframe, then pd.merge the two dataframes. Disadvantages: the notebook gets tied up for ages loading 8 million rows, takes up lots of memory.
Convert the pandas dataframe into a temporary Postgres table, run the join in Postgres using psycopg2, capture the results to a dataframe, delete the temporary table. Disadvantages: fiddly.
Iterate over the pandas dataframe row by row, querying and writing each row individually. Disadvantages: not very pandas-y, and is this even possible?

I'm leaning towards (2), but is there any other approach I haven't thought of, or any other advantages/disadvantages I should consider?

You definitely don't want to even try the 3rd option. I have made myself iterations row by row in pandas and it is extremely slow. A workaround to make 1 faster would be to use an online notebook (i.e Google colab or Kaggle) — Javier Lopez Tomas, Nov 12 '19 at 15:30
#2 for sure. You don't have to manually delete the table, if you create the table in a transaction and use `CREATE TEMPORARY TABLE... ON COMMIT DROP` — Jeremy, Nov 12 '19 at 18:10

0 Answers0