-1

Here is the code:

// database_extractor.py
class DatabaseExtractor(object):

    def __init__(self, ..):

        ...

    def run_extraction(self):

        // run sql query to extract data to a file


//driver.py

def extract__func(db_extractor):

    db_extractor.run_extraction()


if __name__ == "__main__":

    db1 = DatabaseExtractor(..)
    db2 = DatabaseExtractor(..)
    db3 = DatabaseExtractor(..)
    db4 = DatabaseExtractor(..)
    db5 = DatabaseExtractor(..)
    db6 = DatabaseExtractor(..)
    db7 = DatabaseExtractor(..)
    db8 = DatabaseExtractor(..)

    worker_l = [Process(extract_func, args=[db1]), 
                Process(extract_func, args=[db2]),
                Process(extract_func, args=[db3]),
                Process(extract_func, args=[db4]),
                Process(extract_func, args=[db5]),
                Process(extract_func, args=[db6]),
                Process(extract_func, args=[db7]),
                Process(extract_func, args=[db8])]

    for worker in worker_l: worker.start()

    for worker in worker_l: worker.join()

(In reality, the instances of DatabaseExtractor are being generated based on an input config file, so there could be more than 8 processes running)

I referred to the SO post: Reference, quoting the accepted answer "You'll either want to join your processes individually outside of your for loop (e.g., by storing them in a list and then iterating over it) or use something like numpy.Pool and apply_async with a callback". Even though I did the same, all my processes are running sequentially. The reason I know this is because 4 of the instances have queries running for couple of hours and when one of them is kicked off, I do not see the other queries populating their respective output file. How can I force parallel execution of the instances?

Community
  • 1
  • 1
name_masked
  • 9,544
  • 41
  • 118
  • 172
  • Do all processes read from and write to the same database? Maybe the problem is that your processes block each other indirectly via the queries. – swenzel Aug 27 '15 at 15:57
  • They all read from the same database, but write to specific files. i.e. each instance of `DatabaseExtractor` is linked to a single database table and would have its associated output file on disk, so they don't overlap across tables. – name_masked Aug 27 '15 at 16:48

1 Answers1

0

My guess is that something is happening at the DB layer. This example shows everything works as expected as far as processes are concerned. I would recommend checking your database locking etc.

from multiprocessing import Process
from random import randint
from time import sleep

def wait_proc(i, s):
   print "%d - Working for %d seconds" % (i,s)
   sleep(s)
   print "%d - Done." % (i,)

wait_l = [Process(target=wait_proc, args=[i,randint(5,15)]) for i in range(10)]

for w in wait_l:
   w.start()

for w in wait_l:
   w.join()

print "All done."
Chad S.
  • 6,252
  • 15
  • 25