3

A question regarding the generation of random numbers in Numpy.

I have a code which does the following:

import numpy as np

for i in range(very_big_number):

    np.random.randn(5)

    # other stuff that uses the generated random numbers

since unfortunately very_big_number can really be a very large number, I wanted to break this loop into chunks, say e.g. call 10 times the same

for i in range(very_big_number/10):

    np.random.randn(5)

    # other stuff that uses the generated random numbers

and then collate all the output together. However, I want to make sure that this division into blocks preserves the randomness of my generated numbers.

My question is:reading the numpy docuemntation or equivalently this question on StackOverflow, I would be tempted to think that it is enough to just divide the loops and run the subloops on e.g. ten different cores at the same time. However I would like to know if that is correct or if I should set some random number seed and if so, how.

johnhenry
  • 1,293
  • 5
  • 21
  • 43
  • 1
    If you are going to use multiple different processes then you need to call `np.random.seed`, or otherwise the generated numbers will be the same (since the seed is copied to the new processes). You can create a random array of numbers in the parent process and pass the values to the children to use as seed. There are several questions about it in SO. – jdehesa May 18 '18 at 16:37
  • would `np.random.seed()` called at the beginning of each "subloop" work? – johnhenry May 18 '18 at 16:44
  • 1
    If by "subloop" you mean a function that is offloaded to another process (e.g. using [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html) or [Joblib](https://pythonhosted.org/joblib/)) then yes, that's right. – jdehesa May 18 '18 at 16:50
  • @jdehesa I mean that yes, or even physically run it on another computer – johnhenry May 18 '18 at 16:57

1 Answers1

0

Dividing the loop.... the randomness is questionable....

Instead go for parallel processing....

Try below said "Joblib" library or any other library if you know for parallel processing....

https://pythonhosted.org/joblib/parallel.html

Joblib provides a simple helper class to write parallel for loops using multiprocessing

Pavan Chandaka
  • 11,671
  • 5
  • 26
  • 34
  • That's indeed what I want to use, but dividing the loop. Say I divide `100` into `10` times `10` and launch each time on a core, using joblib. Do you know if I should add any seed? – johnhenry May 18 '18 at 16:43
  • If you specify seed, you will notice a repeated set.... if you want that you can.. – Pavan Chandaka May 18 '18 at 16:46
  • and more over, if you want to use seed....do not go with joblib as the documentation says seed is not thread safe – Pavan Chandaka May 18 '18 at 16:48