I am extremely new to have never used any sort of parallel processing methods. I want to read a huge amount of data (i.e. at least 2 million rows) from the SQL Server and want to use parallel processing to speed up the reading. Below is my attempt at the parallel processing using concurrent future process pool.
class DatabaseWorker(object):
def __init__(self, connection_string, n, result_queue = []):
self.connection_string = connection_string
stmt = "select distinct top %s * from dbo.KrishAnalyticsAllCalls" %(n)
self.query = stmt
self.result_queue = result_queue
def reading(self,x):
return(x)
def pooling(self):
t1 = time.time()
con = pyodbc.connect(self.connection_string)
curs = con.cursor()
curs.execute(self.query)
with concurrent.futures.ProcessPoolExecutor(max_workers=8) as executor:
print("Test1")
future_to_read = {executor.submit(self.reading, row): row for row in curs.fetchall()}
print("Test2")
for future in concurrent.futures.as_completed(future_to_read):
print("Test3")
read = future_to_read[future]
try:
print("Test4")
self.result_queue.append(future.result())
except:
print("Not working")
print("\nTime take to grab this data is %s" %(time.time() - t1))
df = DatabaseWorker(r'driver={SQL Server}; server=SPROD_RPT01; database=Reporting;', 2*10**7)
df.pooling()
I am not getting any output with my current implementation. "Test1"
prints and that's it. Nothing else happens. I understood the various examples provided by concurrent future documents but I am unable to implement it here. I will highly appreciate your help. Thank you.