5

This is similar to a previous question but for multiprocessing instead of subprocess. It seems that changing dynamically PYTHONHASHSEED has no effect when using multiprocessing, unlike subprocess:

#check_environ.py
import os, multiprocessing, subprocess, sys

s = 'hello'
print('parent', os.getenv('PYTHONHASHSEED'), hash(s))

if len(sys.argv) > 1:
    os.environ['PYTHONHASHSEED'] = sys.argv[1]
subprocess.call(['python', '-c', "import os;print('subprocess', os.getenv('PYTHONHASHSEED'), hash('{}'))".format(s)])
multiprocessing.Process(target=lambda:print('multiprocessing', os.getenv('PYTHONHASHSEED'), hash(s))).start()

Sample runs:

# explicit PYTHONHASHSEED for subprocess/multiprocessing 
$ python check_environ.py 12

parent None 4472558296122225349
subprocess 12 -8207222429063474615
multiprocessing 12 4472558296122225349

# random PYTHONHASHSEED for subprocess/multiprocessing 
$ python check_environ.py

parent None 7990499464460966677
subprocess None 1081030409066486350
multiprocessing None 7990499464460966677

So no matter what, the multiprocessing hash uses the same seed as the parent. Is there a way to force subprocesses spawned by multiprocessing use a different hash seed?

Darkonaut
  • 20,186
  • 7
  • 54
  • 65
gsakkis
  • 1,569
  • 1
  • 15
  • 24

2 Answers2

2

You can, by using another start method than 'fork' for process creation. Your OS is using fork (you don't get a PicklingError for using a lambda as target).

You can change the start method to 'spawn' (default and only option on Windows) with multiprocessing.set_start_method('spawn') or to 'forkserver' if available. Get all available methods with multiprocessing.get_all_start_methods().

#check_environ.py
import sys, os, subprocess
import multiprocessing as mp


def show(s):
    print('multiprocessing', os.getenv('PYTHONHASHSEED'), hash(s))


if __name__ == '__main__':

    mp.set_start_method('spawn')

    s = 'hello'
    print('parent', os.getenv('PYTHONHASHSEED'), hash(s))

    if len(sys.argv) > 1:
        os.environ['PYTHONHASHSEED'] = sys.argv[1]

    cmd = "import os; " \
          "print('subprocess', os.getenv('PYTHONHASHSEED'), hash('{}'))"
    subprocess.call(['python', '-c', cmd.format(s)])
    p = mp.Process(target=show, args=(s,))
    p.start()
    p.join()

Output in terminal:

$ python check_environ.py 12

parent None 4279361553958749032
subprocess 12 -8207222429063474615
multiprocessing 12 -8207222429063474615

If you need to switch between start methods multiple times use a context object for setting the start method:

ctx = mp.get_context('spawn')
p = ctx.Process(target=foo, args=(var,))

But be prepared to pay a massive time penalty for using another start method than fork. I benchmarked just starting up a python process on my machine running Ubuntu 18.04 with:

  • fork 1.59 ms
  • forkserver 289.83 ms
  • spawn 348.20 ms

But that doesn't have to be relevant for your use case.

Darkonaut
  • 20,186
  • 7
  • 54
  • 65
  • 1
    Note: If you use a `multiprocessing.Pool`, rather than manually created `Process` objects, the penalty should be less; sure, it will still cost more to create the pool, but the worker processes are reused by many tasks, rather than one new spawned process per task. – ShadowRanger Aug 30 '18 at 01:15
  • Hi @Darkonaut, I have a question about running `multiprocessing.Pool` on Windows laptop [here](https://stackoverflow.com/questions/66445724/why-does-this-parallel-process-run-infinitely-on-windows). I hope that you can take some time have a check on this question. Thank you so much for your help! – Akira Mar 03 '21 at 11:16
0

Each of the python processes is launched within a new OS environment whereas in the case of multiprocessing there is only one, shared, and inherited from the parent process.

sophros
  • 14,672
  • 11
  • 46
  • 75