Python - Multiple processes running sequential

Question

Here is the code:

// database_extractor.py
class DatabaseExtractor(object):

    def __init__(self, ..):

        ...

    def run_extraction(self):

        // run sql query to extract data to a file


//driver.py

def extract__func(db_extractor):

    db_extractor.run_extraction()


if __name__ == "__main__":

    db1 = DatabaseExtractor(..)
    db2 = DatabaseExtractor(..)
    db3 = DatabaseExtractor(..)
    db4 = DatabaseExtractor(..)
    db5 = DatabaseExtractor(..)
    db6 = DatabaseExtractor(..)
    db7 = DatabaseExtractor(..)
    db8 = DatabaseExtractor(..)

    worker_l = [Process(extract_func, args=[db1]), 
                Process(extract_func, args=[db2]),
                Process(extract_func, args=[db3]),
                Process(extract_func, args=[db4]),
                Process(extract_func, args=[db5]),
                Process(extract_func, args=[db6]),
                Process(extract_func, args=[db7]),
                Process(extract_func, args=[db8])]

    for worker in worker_l: worker.start()

    for worker in worker_l: worker.join()

(In reality, the instances of DatabaseExtractor are being generated based on an input config file, so there could be more than 8 processes running)

I referred to the SO post: Reference, quoting the accepted answer "You'll either want to join your processes individually outside of your for loop (e.g., by storing them in a list and then iterating over it) or use something like numpy.Pool and apply_async with a callback". Even though I did the same, all my processes are running sequentially. The reason I know this is because 4 of the instances have queries running for couple of hours and when one of them is kicked off, I do not see the other queries populating their respective output file. How can I force parallel execution of the instances?

Do all processes read from and write to the same database? Maybe the problem is that your processes block each other indirectly via the queries. — swenzel, Aug 27 '15 at 15:57
They all read from the same database, but write to specific files. i.e. each instance of `DatabaseExtractor` is linked to a single database table and would have its associated output file on disk, so they don't overlap across tables. — name_masked, Aug 27 '15 at 16:48

Chad S. · Answer 1 · 2015-08-27T18:41:53.933

My guess is that something is happening at the DB layer. This example shows everything works as expected as far as processes are concerned. I would recommend checking your database locking etc.

from multiprocessing import Process
from random import randint
from time import sleep

def wait_proc(i, s):
   print "%d - Working for %d seconds" % (i,s)
   sleep(s)
   print "%d - Done." % (i,)

wait_l = [Process(target=wait_proc, args=[i,randint(5,15)]) for i in range(10)]

for w in wait_l:
   w.start()

for w in wait_l:
   w.join()

print "All done."

Python - Multiple processes running sequential

1 Answers1