I am getting data from an API using Python and then transforming this data into a Pandas Dataframe. This dataframe has a column Date
and more 50 columns. I have retrieved the data from the whole past week.
Using SQLAlchemy, I created a table in Azure SQL Server and loaded the dataframe containing the data from the whole past week using df.to_sql()
.
df.to_sql('table', engine, index=False, dtype=types)
Any past date from this API can be changed.
For example, I stored 60k rows with the column Date as of 05/03/2023, but if I query the API tomorrow and ask for this specific date (05/03/2023), there may be 62k rows (2k new rows) and even rows that changed. So, I want to do an upsert based on a selected date (such as 05/03/2023).
I have tried following this: How to upsert pandas DataFrame to Microsoft SQL Server table?
So, I created a #temp_table to store the new dataframe with the updated data from 05/03/2023 and tried to run the query below that was based in the other question I mentioned.
OBS: the columns provided in the example below are just for giving an example. In real life, I have 51 columns (Date + 50).
OBS 2: I don't have a primary key, so in the ON I am comparing the dates.
# Just to be clear, this is how I am defining my engine variable with SQLAlchemy
engine = sqlalchemy.create_engine('mssql+pyodbc:///?odbc_connect=%s' % url_db,fast_executemany=True,use_setinputsizes=False)
# Creating #temp_table with new df
new_df.to_sql('#temp_table', engine,if_exists='replace', index=False, dtype=types)
# Creating the upsert query
query = (
f"MERGE table WITH (HOLDLOCK) AS main "
f"USING (SELECT date, name, age FROM #temp_table) AS temp "
f"ON ([main].[date] = [temp].[date]) "
f"WHEN MATCHED THEN "
f"UPDATE SET [main].[date] = [temp].[date], [main].[name] = [temp].[name], [main].[age] = [temp].[age]"
f"WHEN NOT MATCHED THEN "
f"INSERT (date, name, age) VALUES ([temp].[date], [temp].[name], [temp].[age]);"
)
# Running the query
with engine.begin() as conn:
try:
conn.execute(text(query)) #GETING STUCK HERE!!!
conn.commit()
except Exception as e:
print(str(e))
conn.rollback()
I have done some print()
and found out that everything is running properly, but it gets stuck in the conn.execute(text(query))
. The code has been running for 30min and it didn't finish.
Is there anything wrong with my code? Or is it caused by the large amount of data? How can I optimize it?