1

I am using pool.map to process my function. There is a random value added in the function. But I find the result from each process is the same. How can I generate different random values for each process. This is an example:

import numpy as np
from numpy.random import randn
def testfun(X):
    scipy.random.seed
    y = randn()
    return y

from multiprocessing import Pool


pool = mp.Pool(processes = 8)

result = pool.map(testfun,np.arange(8))

I want to have 8 different values.

liang wang
  • 26
  • 2
  • I add scipy.random.seed based on https://stackoverflow.com/questions/29854398/seeding-random-number-generators-in-parallel-programs, but it does help – liang wang Dec 14 '21 at 16:36

3 Answers3

2

What you need to do is to seed the random number generator once for each process in the multiprocessing pool (rather than for each call to rand) and the way to do that is with a pool initializer, i.e. specifying the initializer argument on the multiprocessing.Pool constructor. This function you specify will be called once for each process in the pool before any tasks are executed and can be used to perform any one-time initialization such as setting global variables or, in this case, seeding the random number generator for this process.

import numpy as np
from numpy.random import randn, seed


def init_pool_processes():
    seed()

def testfun(X):
    y = randn()
    return y

# Required by Windows:
if __name__ == '__main__':
    from multiprocessing import Pool

    pool = Pool(processes=8, initializer=init_pool_processes)

    result = pool.map(testfun, np.arange(8))
    print(result)

Prints:

[-0.01738709180801345, -0.6941424935875462, 0.41955492420787543, -0.890711442154167, -0.6894630549510319, 1.1549486347982545, -0.27329303494286733, 0.16447656347746123]
Booboo
  • 38,656
  • 3
  • 37
  • 60
  • Thanks for the answer. How come the processes don't end up with the same seed at the end (as seed() is called without any arguments)? Is it because processes are initialized sequentially (so we don't have the same problem where seed uses the same timestamp for all processes as they are run in parallel)? – Kobe-Wan Kenobi Oct 19 '22 at 10:29
  • 1
    Yes. I suppose it could be possible on some platform that for whatever system call Python is using to access the clock the clock may not advance smoothly and you end up with the same seed, but I don't think that would occur in practice. See [this demo](https://ideone.com/DSyoWb). – Booboo Oct 19 '22 at 12:21
0

scipy.random.seed just references the function. You need to actually call it with scipy.random.seed().

dan04
  • 87,747
  • 23
  • 163
  • 198
0

You need to provide different seed values. The value X you get from the range will do it. I was not able to execute your code, but I created a simplified version of it:

from multiprocessing import Pool
import random


def testfun(seed_: int):
    random.seed(seed_)
    y = random.gauss(0, 1)
    return y


if __name__ == "__main__":
    pool = Pool(8)
    result = pool.map(testfun, range(8))
    print(result)

It is also better to put the pool into a with context manager.

DanielTuzes
  • 2,494
  • 24
  • 40