I have a simple task that it is working but it is taking way to long to finish. The task consists in retrieve data from an api (json file) and compare if that data is already on the database or not.
If the data is on database = update item, else =insert.
I do this by looping through a set of current item_id on the database and executing the statement for each scenario. For example:
For item in data_json:
if item[item_id] in setid_db:
cursor.execute(update...)
else:
cursor.execute(insert...)
Already tried fast_executemany, but was nearly the same speed.
On other task that I just needed to insert all the data at once from a dataframe, so I used df.to_sql with method=multi and chunksize = 50 which worked fine and faster. Just don't know yet how to do the equivalent for the current task. Any help will be appreciated.
PS.: After the first load of data from the API (say 30k items) to the database, it may be that on the next load there are only 200 more data to be updated or insert. So, using df.to_sql for these scenarios will overwrite all existing data in the database. That's why I need to execute those statements and make than faster.