Good morning,
I want to update my PostgreSQL database/warehouse by values from a dataframe I have loaded from an Excel file. I track historical employee data.
Technical issue
Records from the dataframe that have the same date as the records in the database/warehouse should be updated. Records that have a new date, should be appended. I'm unsure if this can be done with a relatively easy SQL query in the Python code or it is an issue where SQLAlchemy has to be used. Can you guys help me out with this?
My PostgreSQL database table looks like this:
ID (pk) | empID (int) | Name (varchar) | Age (int) | Location | Date |
---|---|---|---|---|---|
1 | 27019 | Jane | 27 | London | 2021 |
2 | 28901 | Joe | 31 | Texas | 2021 |
3 | 19899 | Steve | 44 | Amsterdam | 2021 |
My dataframe looks like this (data that should be updated is assigned as bold format, data that should be append is in italic format):
Data to be updated: row 2 from DF to row 2 in database
Data to be appended: row 4 to row 7 from DF to database
empID (int) | Name (varchar) | Age (int) | Location | Date |
---|---|---|---|---|
27019 | Jane | 27 | London | 2021 |
28901 | Joe | 31 | Berlin | 2021 |
19899 | Steve | 44 | Amsterdam | 2021 |
27019 | Jane | 27 | London | 2022 |
21002 | Jack | 56 | Paris | 2022 |
28901 | Joe | 31 | Berlin | 2022 |
19899 | Steve | 44 | Amsterdam | 2022 |
17459 | Frank | 32 | Rome | 2022 |
My code is like:
df = pd.read_excel(r"C:\FILEPATH")
table_name = 'employee'
controlDate = []
tableDate = pd.read_sql("select * from " + table_name, connection)
pd.set_option('display.expand_frame_repr', False)
controlDate.append(tableDate)
controlDate = pd.concat(controlDate)
#This line of code appends a 0 value if the table of the database is completely empty, because it won't load an empty database as a dataframe.
emptycheck = controlDate.empty
if (emptycheck == True):
controlDate = controlDate.append({'date':'0'}, ignore_index=True)
else:
print("Not empty")
control = df['date'].isin(controlDate['date'])
conn = None
try:
params = config()
conn = psycopg2.connect(**params)
cur = conn.cursor()
postgreSQL_update_Query = #?
cur.execute(postgreSQL_update_Query)
if (control[1] == True):
print("Replacing")
cur.execute(postgreSQL_update_Query)
cur.close()
print("Replaced")
elif (control[1] == False):
print("Appending")
df.to_sql(table_name, engine, schema="staging", if_exists='append', index=False)
print("Appended")
else:
print("Failed")
print("Employee table is loaded")
conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()
I tried the code as described above, it did not work because I have no working SQL query that updates or appends the data. I know this code can be way shorter and I actually want to scrap the code above if necessary if it all can be done in SQL statement(s).