Using multiprocessing in Python, what is the correct approach for import statements?

Question

PEP 8 states:

Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

However if the class/method/function that I am importing is only used by a child process, surely it is more efficient to do the import when it is needed? My code is basically:

p = multiprocessing.Process(target=main,args=(dump_file,))
p.start()
p.join()
print u"Process ended with exitcode: {}".format(p.exitcode)
if os.path.getsize(dump_file) > 0:
    blc = BugLogClient(listener='http://21.18.25.06:8888/bugLog/listeners/bugLogListenerREST.cfm',appName='main')
    blc.notifyCrash(dump_file)

main() is the main application. This functions needs a lot of imports to run and those take up some ram space (+/- 35MB). As the application runs in another process, the imports were being done twice following PEP 8 (once by the parent process and another one by the child process). It should also be noted that this function should only be called once as the parent process is waiting to see if the application crashed and left an exitcode (thanks to faulthandler). So I coded the imports inside the main function like this:

def main(dump_file):

    import shutil
    import locale

    import faulthandler

    from PySide.QtCore import Qt
    from PySide.QtGui import QApplication, QIcon

instead of:

import shutil
import locale

import faulthandler

from PySide.QtCore import Qt
from PySide.QtGui import QApplication, QIcon

def main(dump_file):

Is there an 'standard' way to handle imports done using multiprocessing?

PS: I´ve seen this sister question

How do you know how much memory you're saving with your proposed method? — gardenhead, Jan 05 '16 at 19:14
I can see how much memory each process takes using windows task manager. Following my proposed method the parent process takes 6Mb and following PEP 8 it takes 36Mb. — Andrés Marafioti, Jan 05 '16 at 19:19
Moving your `main()` function into a separate file is not an option? — Finwood, Jan 05 '16 at 19:23
Already did that, but I need to import that script to call it with multiprocessing. — Andrés Marafioti, Jan 05 '16 at 19:24
Well... you're right, didn't think of that. What about moving the _contents_ of `main()` into a separate module? Then you could follow PEP8 and still only import when needed — Finwood, Jan 05 '16 at 19:26
I´m not following your line of thought. If I move the contents of main() to another module, importing that module following PEP8, will import everything it needs to work, right? and we´re back again where we started. — Andrés Marafioti, Jan 05 '16 at 19:31
This is embarassing. Yes, you are right again, my 'solution' won't help at all. Maybe it's just too late here, I'll be back if I get a working idea :) — Finwood, Jan 05 '16 at 19:36
On unixy systems, `multiprocessing` is a cheap `fork` where children have a copy-on-write view of the parent's space. Nothing is reimported and the "import at the top" rule makes sense. In Windows, `multiprocessing` needs to pickle the environment and rebuild it in the child process space. Reducing imports makes a lot of sense there even if it is not fully pep8. Its a hack anyway... so continue the tradition in your code! — tdelaney, Jan 05 '16 at 19:47

score 2 · Accepted Answer · answered Jan 06 '16 at 13:26

The 'standard' way is the one reported by PEP 8. This is what PEP 8 serves for: a reference guide for coding in Python.

There are always exception though. This case is one of them.

As Windows does not clone the parent's process memory, when a child process is spawned the child process must re-import all the modules. Linux handles processes in a more optimal way avoiding issues like this.

I'm not familiar with Windows memory management but I'd say the modules are shared and not loaded twice. What you probably see is the Virtual Memory of the two processes and not the physical one. On physical memory, only one copy of the modules should be loaded.

It is up to you whether to follow PEP 8 or not. When resources are a constrain the code needs to adapt. But do not over-optimize the code if not necessary! That's a wrong approach.

Using multiprocessing in Python, what is the correct approach for import statements?

1 Answers1