I'm trying to make a python program that matches foreign data input to record in my database which is on another server in my organization.
the issue is I get millions of records at a time and the program takes 4 hours per 1 million records matched with close to no CPU usage.
I've set up timers on every part of the program to test and it's very fast except the part that interact with database through ODBC, so how can I do something like multi processing or multi threading to speed up the program and make the loop start again and not stop waiting for the database response to write it into a file then continue the loop?
Basically I need each iteration of the loop when faced with down time waiting for the database to respond to perform the next iteration immediately and then goes back to right the fetched data to a file (Or append them to a list) while having other iterations of the same loop waiting for the Db response as well.
Thanks in advance.
This is what the code looks like right now
def run_query(DbConn, match_data):
for record in match_data:
stmt = ibm_db.prepare(DbConn, f"select trim(Name) || ' ' || trim(FTHFNAME) || ' ' || trim(FTHSNAME) || ' ' || trim(FTHTNAME) AS NAME, BIRTHDATE, IDNUMSTAT PEOPLE where NAME = ? with ur;", {ibm_db.SQL_ATTR_CURSOR_TYPE:3})
ibm_db.bind_param(stmt, 1, f"{record['NAME']}")
try:
ibm_db.execute(stmt)
except Exception:
print(ibm_db.stmt_errormsg(stmt))
ibm_db.close(DbConn)
sys.exit("Closed duo to an error")
if ibm_db.fetch_assoc(stmt, 1):
print("Matched")
print_row_to_csv(ibm_db.fetch_assoc(stmt, 1))
else:
print("Not matched")
print_row_to_csv({"Not Matched": "Not found"})