0

Good morning,

I want to update my PostgreSQL database/warehouse by values from a dataframe I have loaded from an Excel file. I track historical employee data.

Technical issue

Records from the dataframe that have the same date as the records in the database/warehouse should be updated. Records that have a new date, should be appended. I'm unsure if this can be done with a relatively easy SQL query in the Python code or it is an issue where SQLAlchemy has to be used. Can you guys help me out with this?

My PostgreSQL database table looks like this:

ID (pk) empID (int) Name (varchar) Age (int) Location Date
1 27019 Jane 27 London 2021
2 28901 Joe 31 Texas 2021
3 19899 Steve 44 Amsterdam 2021

My dataframe looks like this (data that should be updated is assigned as bold format, data that should be append is in italic format):

Data to be updated: row 2 from DF to row 2 in database

Data to be appended: row 4 to row 7 from DF to database

empID (int) Name (varchar) Age (int) Location Date
27019 Jane 27 London 2021
28901 Joe 31 Berlin 2021
19899 Steve 44 Amsterdam 2021
27019 Jane 27 London 2022
21002 Jack 56 Paris 2022
28901 Joe 31 Berlin 2022
19899 Steve 44 Amsterdam 2022
17459 Frank 32 Rome 2022

My code is like:

df = pd.read_excel(r"C:\FILEPATH")

table_name = 'employee'
controlDate = []

tableDate = pd.read_sql("select * from " + table_name, connection)
pd.set_option('display.expand_frame_repr', False)

controlDate.append(tableDate)
controlDate = pd.concat(controlDate)

#This line of code appends a 0 value if the table of the database is completely empty, because it won't load an empty database as a dataframe.

emptycheck = controlDate.empty
if (emptycheck == True):
    controlDate = controlDate.append({'date':'0'}, ignore_index=True)
else:
    print("Not empty")

control = df['date'].isin(controlDate['date'])

conn = None

try:
    params = config()
    conn = psycopg2.connect(**params)
    cur = conn.cursor()
   

    postgreSQL_update_Query = #?
    cur.execute(postgreSQL_update_Query)


    if (control[1] == True):
        print("Replacing")
        cur.execute(postgreSQL_update_Query)
        cur.close()
        print("Replaced")
    elif (control[1] == False):
        print("Appending")
        df.to_sql(table_name, engine, schema="staging", if_exists='append', index=False)
        print("Appended")
    else:
        print("Failed")
    print("Employee table is loaded")

    conn.commit()

except (Exception, psycopg2.DatabaseError) as error:
    print(error)

finally:
    if conn is not None:
        conn.close()

I tried the code as described above, it did not work because I have no working SQL query that updates or appends the data. I know this code can be way shorter and I actually want to scrap the code above if necessary if it all can be done in SQL statement(s).

Trebaxe
  • 1
  • 1
  • Does this answer your question? [How to upsert pandas DataFrame to PostgreSQL table?](https://stackoverflow.com/questions/61366664/how-to-upsert-pandas-dataframe-to-postgresql-table) – Gord Thompson Nov 24 '22 at 12:48

0 Answers0