0

I have a dataframe that looks like this.

   name   id
0  alec  NaN
1  cole  NaN
2  troy  1.0

And I have a postgres db that reflects the same structure.

I want to insert this dataframe into my database, but I know that row at index 2 is already in the db because it has an 'id'. So in theory, I only want to insert rows at index 0 and 1, because I know that they are not yet in the database.

Then I want to get back the same dataframe with the new ids updated into my dataframe so it looks like this.

   name   id
0  alec  2.0
1  cole  3.0
2  troy  1.0

I am using psycopg2 and a postgresql database. I am also inserting thousands of records at a time, so I need this execution to be relatively fast.

Is there an easy way to do this?

Alec Mather
  • 742
  • 5
  • 20
  • What about selecting rows from your dataframe with something like `filtered_df = df[df['id'].notnull()]` then apply one of those methods (https://stackoverflow.com/questions/8134602/psycopg2-insert-multiple-rows-with-one-query) to do batch insert into postgres using psycopg. Is that what you are trying to do ? – Beinje Dec 17 '20 at 21:30
  • Yeah that's what I'm currently doing, I was just wondering if there was an easier/short-hand way to do that. – Alec Mather Dec 17 '20 at 22:06
  • As long as you are not inserting records one by one and using some pyscopg bulk methods, you're probably good to go. This 2-step process doesn't look that complicated to me. – Beinje Dec 18 '20 at 09:13

0 Answers0