2

I have a method which calculates a final result using multiple other methods. It has a while loop inside which continuously checks for new data, and if new data is received, it runs the other methods and calculates the results. This main method is the only one which is called by the user, and it stays active until the program is closed. the basic structure is as follows:

class sample:
     def __init__(self):
           results = []
     def main_calculation(self):
           while True:
                 #code to get data
                 if newdata != olddata:
                       #insert code to prepare data for analysis
                       res1 = self.calc1(prepped_data)
                       res2 = self.calc2(prepped_data)
                       final = res1 + res2
                       self.results.append(final)

I want to run calc1 and calc2 in parallel, so that I can get the final result faster. However, I am unsure of how to implement multiprocessing in this way, since I'm not using a __main__ guard. Is there any way to run these processes in parallel?

This is likely not the best organization for this code, but it is what is easiest for the actual calculations I am running, since it is necessary that this code be imported and run from a different file. However, I can restructure the code if this is not a salvageable structure.

  • Could you clarify what you think a `__main__` guard has to do with multiprocessing? I'm not quite sure I understand what you really want to know, but I hope that having the answer to that will make it more clear. – David Z Sep 05 '20 at 08:50
  • @DavidZ Hello! In all of the examples I have found so far, the lines for starting the parallel processes were inside of a main guard, and I've found reference to an error which comes up when this is not used: https://stackoverflow.com/questions/47705228/multiprocessing-a-loop-inside-a-loop-inside-a-function I have gotten this error as well when I try to start the processes in parallel from inside the while loop. Thank you! – prayingmantis Sep 06 '20 at 07:12
  • OK thanks, that does help clarify things. Could you edit your question so that it includes that information? I'll see if I can try to come up with an answer. – David Z Sep 06 '20 at 07:15

1 Answers1

1

According to the documentation, the reason you need to use a __main__ guard is that when your program creates a multiprocessing.Process object, it starts up a whole new copy of the Python interpreter which will import a new copy of your program's modules. If importing your module calls multiprocessing.Process() itself, that will create yet another copy of the Python interpreter which interprets yet another copy of your code, and so on until your system crashes (or actually, until Python hits a non-reentrant piece of the multiprocessing code).

In the main module of your program, which usually calls some code at the top level, checking __name__ == '__main__' is the way you can tell whether the program is being run for the first time or is being run as a subprocess. But in a different module, there might not be any code at the top level (other than definitions), and in that case there's no need to use a guard because the module can be safely imported without starting a new process.

In other words, this is dangerous:

import multiprocessing as mp

def f():
    ...

p = mp.Process(target=f)
p.start()
p.join()

but this is safe:

import multiprocessing as mp

def f():
    ...

def g():
    p = mp.Process(target=f)
    p.start()
    p.join()

and this is also safe:

import multiprocessing as mp

def f():
    ...

class H:
    def g(self):
        p = mp.Process(target=f)
        p.start()
        p.join()

So in your example, you should be able to directly create Process objects in your function.

However, I'd suggest making it clear in the documentation for the class that that method creates a Process, because whoever uses it (maybe you) needs to know that it's not safe to call that method at the top level of a module. It would be like doing this, which also falls in the "dangerous" category:

import multiprocessing as mp

def f():
    ...

class H:
    def g(self):
        p = mp.Process(target=f)
        p.start()
        p.join()

H().g()  # this creates a Process at the top level

You could also consider an alternative approach where you make the caller do all the process creation. In this approach, either your sample class constructor or the main_calculation() method could accept, say, a Pool object, and it can use the processes from that pool to do its calculations. For example:

class sample:
    def main_calculation(self, pool):
        while True:
            if newdata != olddata:
                res1_async = pool.apply_async(self.calc1, [prepped_data])
                res2_async = pool.apply_async(self.calc2, [prepped_data])
                res1 = res1_async.get()
                res2 = res2_async.get()
                # and so on

This pattern may also allow your program to be more efficient in its use of resources, if there are many different calculations happening, because they can all use the same pool of processes.

David Z
  • 128,184
  • 27
  • 255
  • 279