0

I am extremely new to have never used any sort of parallel processing methods. I want to read a huge amount of data (i.e. at least 2 million rows) from the SQL Server and want to use parallel processing to speed up the reading. Below is my attempt at the parallel processing using concurrent future process pool.

class DatabaseWorker(object):
    def __init__(self, connection_string, n, result_queue = []):
        self.connection_string = connection_string
        stmt = "select distinct top %s * from dbo.KrishAnalyticsAllCalls" %(n)
        self.query = stmt
        self.result_queue = result_queue

    def reading(self,x):
        return(x)

    def pooling(self):   
        t1 = time.time()
        con = pyodbc.connect(self.connection_string)
        curs = con.cursor()
        curs.execute(self.query)                  
        with concurrent.futures.ProcessPoolExecutor(max_workers=8) as executor:
            print("Test1")
            future_to_read = {executor.submit(self.reading, row): row for row in curs.fetchall()}
            print("Test2")
            for future in concurrent.futures.as_completed(future_to_read):
                print("Test3")
                read = future_to_read[future]
                try:
                    print("Test4")
                    self.result_queue.append(future.result())
                except:
                    print("Not working")
        print("\nTime take to grab this data is %s" %(time.time() - t1))   


df = DatabaseWorker(r'driver={SQL Server}; server=SPROD_RPT01; database=Reporting;', 2*10**7)
df.pooling()

I am not getting any output with my current implementation. "Test1" prints and that's it. Nothing else happens. I understood the various examples provided by concurrent future documents but I am unable to implement it here. I will highly appreciate your help. Thank you.

Krishnang K Dalal
  • 2,322
  • 9
  • 34
  • 55
  • You have specified a list as a default argument - I do Not know if that is your problem but you should read [“Least Astonishment” and the Mutable Default Argument](https://stackoverflow.com/q/1132941/2823755). A search using `python mutable default argument` may have other useful results. – wwii Jul 09 '18 at 14:58
  • Why do you think the bottleneck is the number of local processes fetching data? – wwii Jul 09 '18 at 20:16

0 Answers0