25

Running python 2.7 on windows 7 (64bit).

When reading the docs for library module multiprocessing, it states several times the importance of the __main__ module, including the conditional (especially in Windows):

if __name__ == "__main__":
    # create Process() here

My understanding, is that you don't want to create Process() instances in the global namespace of the module (because when the child process imports the module, he will spawn yet another inadvertently).

I do not have to place Process managers at the very top level of my package execution hierarchy though (execution in the PARENT). As long as my Process()'s are created, managed, and terminated in a class method, or even in a function closure. Just not in the toplevel module namespace.

Am I understanding this warning/requirement correctly?


EDIT

After the first two responses, I add this quotation. This is in the introduction for Section 16.6 multiprocessing from the 2.7 docs.

Note: Functionality within this package requires that the __main__ module be importable by the children. This is covered in Programming guidelines however it is worth pointing out here.This means that some examples, such as the multiprocessing.Pool examples will not work in the interactive interpreter...

user2097818
  • 1,821
  • 3
  • 16
  • 34
  • 7
    Pedantic note: variables in `if __name__ == '__main__':` are still in the **namespace** of the module when the code actually runs. The code isn't however executed when the module is imported. (I.e. I believe that if you import the main module of a program you can retrieve variables from its main block as module attributes.) – millimoose Nov 26 '13 at 16:50
  • So @millimoose variables created inside the `if` are accessible to all spawned processes on Windows machine, correct? Even if they have not been declared outside the `if`? – Kartik Jun 30 '16 at 08:27
  • @Kartik - I'm not sure I understand the question, it's been three years anyway. I think they will be accessible but their values will be bogus. I suggest you write some test code to find out what you have in mind, and post a new question on SO if you have any specific issues with it. – millimoose Jun 30 '16 at 13:29

2 Answers2

34

You do not have to call Process() from the "top level" of the module. It is perfectly fine to call Process from a class method.

The only caveat is that you can not allow Process() to be called if or when the module is imported.

Since Windows has no fork, the multiprocessing module starts a new Python process and imports the calling module. If Process() gets called upon import, then this sets off an infinite succession of new processes (or until your machine runs out of resources). This is the reason for hiding calls to Process() inside

if __name__ == "__main__"

since statements inside this if-statement will not get called upon import.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • I believe I understand your point here. I have made an edit to the original question that better illustrates my confusion. – user2097818 Nov 26 '13 at 16:53
  • Q: *"Why would it NEED to be able to import `__main__`?"*. A: On Windows, calling `Process()` causes the calling module to be imported. When using `multiprocessing`, you need to code with the expectation that the calling module will get imported. – unutbu Nov 26 '13 at 16:58
  • I think I am over-analyzing. I will plan for my multiprocessing module to be imported. In fact, it will never be exectuted, because my program is also going to import it, and must interact with a Factory class before any Process() instances are created. – user2097818 Nov 26 '13 at 18:01
  • Does it mean the warning in [joblib's documentation](https://pythonhosted.org/joblib/parallel.html), saying that no *code* should be executed outside the `if __name__ == '__main__'` block, is an overkill? – Ziyuan Feb 18 '15 at 16:06
  • 2
    @ziyuang: What is important is that you understand what happens to code outside the `if-block` -- in particular, that on Windows every spawned process will re-import the calling module and thus re-execute all code outside the `if-block`. The `joblib` doc says, "only imports and definitions". Definitions can include definitions of variables as well as functions. Just be sure not to spawn subprocesses outside the `if-block` since (on Windows) that surely leads to a [fork bomb](http://en.wikipedia.org/wiki/Fork_bomb). – unutbu Feb 18 '15 at 17:52
  • @unutbu: What if I have a variable that gets created inside the `if-block`, but needs to be used by a function passed to the pool of workers later on? Basically, will this work on Windows: `if __name__ == '__main__': var = 10` <\n> `with Pool(2) as p: p.map(partial(my_funct, var), task_list)`? (Note that `with Pool...` is inside the `if-block`) – Kartik Jun 30 '16 at 08:19
3

__name__ is only ever equal to "__main__" if the script has been executed directly, either via python foo.py or python -m foo. This ensures that Process() will not be called if the script is imported as a module instead.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • 3
    So multiprocessing cannot be used in a module? – jul Apr 15 '16 at 09:30
  • @jul It can be, but the actual constructor calls to `Process()` must be wrapped as a "Singleton Command". This is because Windows will not only *clone* the process, but it must re-execute a copy of the python interpreter, which means the "main module" will be imported twice. Essentially, keep your import "activity" to a minimum. After the import is complete, you can do whatever you want with the classes and functions in your module. – user2097818 Jul 28 '17 at 09:56
  • 7
    @user2097818 Could you explain in a bit more detail what you mean by a "Singleton Command"? Essentially I am wondering if I can produce a function (defined in some module I will import) and execute that function without wrapping it in "if __name__ == '__main__'" that internally used multiprocessing? – Ymareth Aug 03 '17 at 21:00
  • 1
    To be pedantic, isn't `__name__ == '__main__'` always true for `__main__.py`? – Solomon Ucko Apr 04 '19 at 21:22