10

I have not been able to implement the suggestion here: Applying two functions to two lists simultaneously.

I guess it is because the module is imported by another module and thus my Windows spawns multiple python processes?

My question is: how can I use the code below without the if if __name__ == "__main__":

args_m = [(mortality_men, my_agents, graveyard, families, firms, year, agent) for agent in males]
args_f = [(mortality_women, fertility, year, families, my_agents, graveyard, firms, agent) for agent in females]

with mp.Pool(processes=(mp.cpu_count() - 1)) as p:
    p.map_async(process_males, args_m)
    p.map_async(process_females, args_f)

Both process_males and process_females are fuctions. args_m, args_f are iterators

Also, I don't need to return anything. Agents are class instances that need updating.

Community
  • 1
  • 1
B Furtado
  • 1,488
  • 3
  • 20
  • 34
  • It should be ok if you move this code into a function that gets called by the main script, which is running as `__main__`, instead of using top-level module code. – Eryk Sun Mar 05 '17 at 02:12

4 Answers4

8

The reason you need to guard multiprocessing code in a if __name__ == "__main__" is that you don't want it to run again in the child process. That can happen on Windows, where the interpreter needs to reload all of its state since there's no fork system call that will copy the parent process's address space. But you only need to use it where code is supposed to be running at the top level since you're in the main script. It's not the only way to guard your code.

In your specific case, I think you should put the multiprocessing code in a function. That won't run in the child process, as long as nothing else calls the function when it should not. Your main module can import the module, then call the function (from within an if __name__ == "__main__" block, probably).

It should be something like this:

some_module.py:

def process_males(x):
    ...

def process_females(x):
    ...

args_m = [...] # these could be defined inside the function below if that makes more sense
args_f = [...]

def do_stuff():
    with mp.Pool(processes=(mp.cpu_count() - 1)) as p:
        p.map_async(process_males, args_m)
        p.map_async(process_females, args_f)

main.py:

import some_module

if __name__ == "__main__":
    some_module.do_stuff()

In your real code you might want to pass some arguments or get a return value from do_stuff (which should also be given a more descriptive name than the generic one I've used in this example).

Blckknght
  • 100,903
  • 11
  • 120
  • 169
  • 1
    But that is exactly the point. I cannot call it from `main.py`. My full program is as follows: `main.py` initiates the process. then a parametrization is called, then agents are loaded or created, then a time iteration module takes place. Time iteration calls `running_month` where I need `do_stuff()`. On top of that, on multiple simulation, I call main.py 100 times. In sum, the ordering of events prevent me from having do_stuff inside if __name__... – B Furtado Mar 06 '17 at 12:59
  • 1
    You can have the call come via some other function if you want. Ultimately, it's the script code running in main.py that's going to kick off all the work (or it needs to be, if your code is going to work with multiprocessing). Put everything you don't want to run an extra time in the child processes inside a function, and only call those functions from inside the `if __name__ == "__main__"` guard in main.py (or from other functions). – Blckknght Mar 06 '17 at 16:42
4

The idea of if __name__ == '__main__': is to avoid infinite process spawning.

When pickling a function defined in your main script, python has to figure out what part of your main script is the function code. It will basically re run your script. If your code creating the Pool is in the same script and not protected by the "if main", then by trying to import the function, you will try to launch another Pool that will try to launch another Pool....

Thus you should separate the function definitions from the actual main script:

from multiprocessing import Pool

# define test functions outside main
# so it can be imported withou launching
# new Pool
def test_func():
    pass

if __name__ == '__main__':
    with Pool(4) as p:
        r = p.apply_async(test_func)
        ... do stuff
        result = r.get()
Thomas Moreau
  • 4,377
  • 1
  • 20
  • 32
  • Can I import test_func() from another module and then have `p.apply_async(other_module.test_func)`? The thing is, I cannot trigger test_func until I have the data. Then, I need to call it repeatedly, every month. I'll try further... Thanks. – B Furtado Mar 05 '17 at 17:34
  • Yes you should be able to import `test_func` in another module without issue. The issues arise for functions that are defined in the `__main__` module. The functions passed to `apply_async` should just be attached to a regular module. – Thomas Moreau Mar 05 '17 at 23:24
  • 13
    I think this answer misses the main point of the question, that the `Pool` creating code is supposed to be in another module, not the main one. Thus you can't put an `if __name__ == "__main__"` guard around it, since it's not in the main module at all. – Blckknght Mar 06 '17 at 04:37
0

Cannot yet comment on the question, but a workaround I have used that some have mentioned is just to define the process_males etc. functions in a module that is different to where the processes are spawned. Then import the module containing the multiprocessing spawns.

0

I solved it by calling the modules' multiprocessing function within "if __ name__ == "__ main__":" of the main script, as the function that involves multiprocessing is the last step in my module, others could try if aplicable.

blackcat
  • 78
  • 4