Insert into postgresql only if id column is null, but return entire dataframe

Question

I have a dataframe that looks like this.

   name   id
0  alec  NaN
1  cole  NaN
2  troy  1.0

And I have a postgres db that reflects the same structure.

I want to insert this dataframe into my database, but I know that row at index 2 is already in the db because it has an 'id'. So in theory, I only want to insert rows at index 0 and 1, because I know that they are not yet in the database.

Then I want to get back the same dataframe with the new ids updated into my dataframe so it looks like this.

   name   id
0  alec  2.0
1  cole  3.0
2  troy  1.0

I am using psycopg2 and a postgresql database. I am also inserting thousands of records at a time, so I need this execution to be relatively fast.

Is there an easy way to do this?

What about selecting rows from your dataframe with something like `filtered_df = df[df['id'].notnull()]` then apply one of those methods (https://stackoverflow.com/questions/8134602/psycopg2-insert-multiple-rows-with-one-query) to do batch insert into postgres using psycopg. Is that what you are trying to do ? — Beinje, Dec 17 '20 at 21:30
Yeah that's what I'm currently doing, I was just wondering if there was an easier/short-hand way to do that. — Alec Mather, Dec 17 '20 at 22:06
As long as you are not inserting records one by one and using some pyscopg bulk methods, you're probably good to go. This 2-step process doesn't look that complicated to me. — Beinje, Dec 18 '20 at 09:13

Insert into postgresql only if id column is null, but return entire dataframe

0 Answers0