31

I wish to run several instances of a simulation in parallel, but with each simulation having its own independent data set.

Currently I implement this as follows:

P = mp.Pool(ncpus) # Generate pool of workers
for j in range(nrun): # Generate processes
    sim = MDF.Simulation(tstep, temp, time, writeout, boundaryxy, boundaryz, relax, insert, lat,savetemp)
    lattice = MDF.Lattice(tstep, temp, time, writeout, boundaryxy, boundaryz, relax, insert, lat, kb, ks, kbs, a, p, q, massL, randinit, initvel, parangle,scaletemp,savetemp)
    adatom1 = MDF.Adatom(tstep, temp, time, writeout, boundaryxy, boundaryz, relax, insert, lat, ra, massa, amorse, bmorse, r0, z0, name, lattice, samplerate,savetemp)        
    P.apply_async(run,(j,sim,lattice,adatom1),callback=After) # run simulation and ISF analysis in each process
P.close()
P.join() # start processes  

where sim, adatom1 and lattice are objects passed to the function run which initiates the simulation.

However, I recently found out that each batch I run simultaneously (that is, each ncpus runs out of the total nrun of simulations runs) gives the exact same results.

Can someone here enlighten how to fix this?

skrrgwasme
  • 9,358
  • 11
  • 54
  • 84
Mickey Diamant
  • 657
  • 1
  • 9
  • 17
  • How do you obtain the results? – Janne Karila Feb 09 '12 at 10:49
  • Does this involve random numbers? How are you setting the seeds? Why should these be different? If you run the same process twice it's supposed to produce the same result twice. Why do you think they should be different? – S.Lott Feb 09 '12 at 10:54
  • The function 'run' starts the simulation and returns the results which are directed the function 'After' to consolidate all the results. Each simulation has random initial conditions which is why i expect to have different results. I don't use a seed i think. I use the following code: randshift = np.random.rand(a,b)-0.5*np.ones((a,b)) – Mickey Diamant Feb 09 '12 at 10:59
  • 1
    Do you get different results if you replace `apply_async` with a direct call to `After(run(j,sim,lattice,adatom1))`? – Janne Karila Feb 09 '12 at 11:20
  • Janne, I tried your suggestion. It simply runs each sim one after the other without paralleling. – Mickey Diamant Feb 09 '12 at 12:33
  • Well it seems that i can assign different seeds to each run, but with no change. However i did notice that all the simulations runs in a single batch have the same PID. – Mickey Diamant Feb 09 '12 at 13:04
  • 2
    Solved i think. Per an advice here [link] (http://stackoverflow.com/questions/6914240/multiprocessing-pool-seems-to-work-in-windows-but-not-in-ubuntu) I added `scipy.random.seed` in the calling function 'run'. – Mickey Diamant Feb 09 '12 at 13:34
  • 5
    Do not put "solved" in the question or in a comment. Please put an **Answer** that explains the solution. Do not add comments with critical details. Please **update** the question to include all the facts. – S.Lott Feb 09 '12 at 13:52
  • 1
    @MickeyDiamant can you post some code one how you solved it? An answer with actual would be super helpful. – Charlie Parker Apr 05 '17 at 02:53
  • @S.Lott why does each process need to result in the exact same result? They are different processes, so different results is for some reason, not a sensible expectation? Why? – Charlie Parker Apr 05 '17 at 03:06

3 Answers3

30

Just thought I would add an actual answer to make it clear for others.

Quoting the answer from aix in this question:

What happens is that on Unix every worker process inherits the same state of the random number generator from the parent process. This is why they generate identical pseudo-random sequences.

Use the random.seed() method (or the scipy/numpy equivalent) to set the seed properly. See also this numpy thread.

Community
  • 1
  • 1
dgorissen
  • 6,207
  • 3
  • 43
  • 52
  • 1
    does this guarantee that any library using random numbers with each new process will have correctly started with a new random number? Or do we need to set the random number for each library separately? – Charlie Parker Apr 05 '17 at 02:50
  • I believe that this answer actually depends on the method by which the new processes are created ("spawn", "fork", or "forkserver"). If you are using "fork" (the default), then yes, the worker process inherits parent states. If you are using "spawn" then everything is "remade" and the random number generator will be in its default mode instead of copying from the parent (unless you explicitly tell it to re-use the same seed). – Mandias Apr 02 '22 at 05:23
7

This is an unsolved problem. Try to generate a unique seed for each process. You can add below code to beginning of your function to overcome the issue.

np.random.seed((os.getpid() * int(time.time())) % 123456789)
alercelik
  • 615
  • 7
  • 11
  • Is the `os.getpid()` unique for every process? Because the processes/workers(?) are created at similar moments; is there not a change that with this creates processes which will be using the same seeds? – HerChip Aug 17 '22 at 09:41
  • Yes, each process has a unique pid (Process ID) no matter how they are created. On the other hand, threads in the same process have same pid, of course. – alercelik Aug 24 '22 at 05:13
1

A solution for the problem was to use scipy.random.seed() in the function run which assign a new seed for random functions called from run.

A similar problem (from which i obtained the solution) can be found in multiprocessing.Pool seems to work in Windows but not in ubuntu?

Community
  • 1
  • 1
skrrgwasme
  • 9,358
  • 11
  • 54
  • 84
  • 3
    Is there no way to set the random number for every process that might use random numbers? Say one uses the module random, numpy, scipy, tensorflow and who knows what else. Is the only way to make sure the process has a different random seed to go through each of these and manually set the state? – Charlie Parker Apr 05 '17 at 03:09
  • you can pass seed number to each process as an input argument if you don't like to set them manually. eg: `pool.map(func, seedlist)` and in func: `def func(myseed): np.random.seed(myseed)` – Maryam Hnr Aug 12 '18 at 01:45