2

Important Edit: If you find the time to test the snippets below, please make sure to start a completely fresh session or call np.random.seed(None) once.

Background:

I have been under the impression that functions such as np.random.randint() would draw the same set of numbers for identical random states (or whatever you would call the output from np.random.get_state()).

Let me explain why:

The following snippet uses np.random.randint() to generate 5 random integers betwenn -10 and 10, and stores some info about the process. What I've named 'state' are the 5 first numbers from the array stored in the second element in the tuple returned by np.random.get_state().

Snippet 1

# 1. Imports
import pandas as pd
import numpy as np

# 2. describe random state by
# retrieving the five first numbers
# in the array in the second element
# of the tuple returned by np.random.get_state()
randomState = np.random.get_state()
state = np.random.get_state()[1][:5]

# 3. generate random numbers
randints = np.random.randint(-10, 10, size = 5)

# 4. organize and present findings
df = pd.DataFrame.from_dict({'state':state, 'randints':randints})
print(df)

Run this code once and you'll get results as in the first output section below. Just notice that the numbers themselves will differ from mine since no random seed has been set. What is important is the internal logic of the three sets of output. And if you run the same snippet more that once, you'll notice something that I think is really weird:

Output 1: some random numbers and a random state:

   randints       state
0       -10  2871458436
1         7  4226334938
2         1   179611462
3        -9  3145869243
4         5   317931933

So far, so good! We have 5 random integers and 5 numbers representing the random state. Run the same snippet again, and you'll get something like this:

Output 2: new random numbers and a new random state:

   randints       state
0         1   727254058
1         7  1473793264
2         4  2934556005
3         1   721863250
4        -6  3873014002

Now you seemingly have a new random state and 5 new random numbers. So seemingly, my assumption still holds. But every time I've tried this, things are getting weird when you run the same code a third time. Just look at this:

Output 3: new random numbers and a the same random state as before:

   randints       state
0         8   727254058
1        -4  1473793264
2        -1  2934556005
3       -10   721863250
4        -1  3873014002

As you can see, my assumption was clearly wrong. What is really going on here?

Summary:

  1. Why does np.random.randint() return different integers for the same random state?
  2. Why does running this snippet yield different random states for the first and secon run, but then return the same random state for the second and third run?

Thank you for any suggestions!

My system:

  • Python 3.6.0
  • IPython 5.1.0
  • Numpy 1.11.3
  • Spyder 3.2.7
  • Windows 64

Appendix:

You'll get the same result if you wrap the same procedure into a function and run it more than two times.

Snippet 2 - Same as Snippet 1 wrapped in a function

def rnumbers(numbers, runs):

    df_out = pd.DataFrame()
    runs = np.arange(runs)

    for r in runs:

        print(r)

        state = np.random.get_state()[1][:numbers]

        # 4. generate random numbers
        randints = np.random.randint(-10, 10, size = numbers)

        # 5. organize and present findings
        df_temp = pd.DataFrame.from_dict({'state_'+str(r+1):state, 'randints_'+str(r+1):randints})

        df_out = pd.concat([df_out, df_temp], axis = 1)

    return df_out

df = rnumbers(10,3)
print(df)

Output:

   randints_1     state_1  randints_2     state_2  randints_3     state_3
0           4  3582151794          -5  1773875493           7  1773875493
1          -7  2910116392          -8  2402690106           3  2402690106
2          -8  3435011439           3  1330293688           4  1330293688
3           1   486242985           4   847834894           2   847834894
4          -3  4214584559           4  4209159694          -2  4209159694
5           4   752109368          -3  2673278965           1  2673278965
6         -10  3726578976           8  2475058425           4  2475058425
7           8  1510778984          -5  3758042425           0  3758042425
8          -2  4202558983          -5  2381317628           0  2381317628
9           4  1514856120           6  3177587154          -7  3177587154
vestland
  • 55,229
  • 37
  • 187
  • 305
  • 1
    What is `np.random.seed(None)` supposed to do? Remove it and your output will be consistent. As for Snippet two - I can't confirm your output. [Mine is the same for all three rounds.](http://storage8.static.itmages.com/i/18/0413/h_1523607623_2843886_0834d1d951.jpeg) – Mr. T Apr 13 '18 at 08:20
  • I edited the question. An important point here is that the random seed has not been set. That was only a part of one of my many tests prior to asking the question. And yes, I'm aware of the point with `np.random.seed(None)` – vestland Apr 13 '18 at 08:29
  • 1
    1. It shouldn't. 2. Are you sure the random state is the same? Try printing out the entire `randomState`. I'm getting different randomState for different runs (Note: not for your second snippet, which since you have set random.seed should then give you the same randomState for different runs, as Mr. T has.) – Melvin Apr 13 '18 at 08:31
  • See Robert Kern's response here: https://stackoverflow.com/questions/5836335/consistenly-create-same-random-numpy-array/5837352#5837352although and here: https://stackoverflow.com/questions/37224116/difference-between-randomstate-and-seed-in-numpy – Melvin Apr 13 '18 at 08:32
  • @Melvin I edited the code to represent the problem (all set.seed sections were only there for testing purposes, and should have been commented out). I'm getting the very same results as described in the question when I'm starting a fresh session. – vestland Apr 13 '18 at 08:37
  • @Melvin And yes, I'm aware of the responses from Robert Kerr about creating a reproducible random set. Now I'm just trying to understand the inner workings of numpy. – vestland Apr 13 '18 at 08:40
  • @Melvin Yes I'm pretty sure the entire random state is the same. You can try running `rnumbers(100,3)` and see for yourself. Just make sure that you're starting a fresh session and that no random seed has ben set already. – vestland Apr 13 '18 at 08:53
  • 1
    The pos number, `[2]` in `np.random.get_state()`, changes across different runs. That (I'm guessing here) takes the corresponding value in the array before, to set the randomstate. To verify, set a session's randomstate using `np.random.set_state(x)` where x is copy-pasted, and convert the array portion into a np.array with dtype='uint32', for unsigned int32. Then, set the pos value to say 1, and play around with the other values in the array apart from [1]. You will get the same `randint` or any other random functions. Now, if you change pos or the value indicated by pos, the output changes. – Melvin Apr 13 '18 at 09:03

2 Answers2

4

So to summarize the question: the first 5 numbers of a part of the random state are sometimes the same, but the output of the random generator is different.

The short answer is: the random state does change, but the first 5 numbers you are looking at remain the same. The change is in the number at index 2:

for i in range(3):
    randomState = np.random.get_state()
    state = np.random.get_state()[2]
    randints = np.random.randint(-10, 10, size = 5)
    df = pd.DataFrame.from_dict({'state':state, 'randints':randints})
    print(df)

Output:

   randints  state
0        -9    624
1         6    624
2         4    624
3        -5    624
4         5    624
   randints  state
0        -9      5
1        -5      5
2         4      5
3        -4      5
4        -4      5
   randints  state
0         5     10
1        -8     10
2         8     10
3       -10     10
4        -3     10

Numpy uses the Mersenne Twister algorithm, which generates 32-bits random numbers, in groups of 624 at a time. So we might expect the big state array to remain the same until all these numbers have been consumed and the Twister needs to be called again.

At index 2 of the state, it stores how many of these numbers have already been consumed. This starts out at 624, so the Twister is run once at the start, before generating any output. After that, you'll see the list remain the same until all 624 numbers have been consumed. Then the Twister is called again, the counter is reset to 0, and the entire thing starts over.

Thomas
  • 174,939
  • 50
  • 355
  • 478
  • Thanks for accepting, but I personally like @jotasi's answer better :) It goes into more depth, and goes into detail about a subtlety of `pos` that I must have overlooked because I got lucky rolls. – Thomas Apr 17 '18 at 08:57
3

The reason for that behavior is, that you are only checking, whether the state vector is the same. There is another important part of the RandomState, namely the position pos indicating basically, how much of the state vector has been "used up". It is given by the integer after the state array in the return values of get_state() (see the docs of get_state()). Every byte of pseudo-randomness requested only depends on one of the elements of the state vector. Cross-dependencies of the elements only arise in the refilling procedure. (For more details on the PRNG check for example the wikipedia page for the employed Mersenne Twister.)

In the initialization the vector will be filled up based on the seed and then the position will be set to the end (as you can see here in numpy's sources).

import numpy as np
state = np.random.get_state()
print(state[1].shape)           # (624, )
print(state[2])                 # 624

When you now request a byte of pseudo-randomness, this function will be called, which includes the check how much of the vector is used. As pos was set to the length of the state vector minus one, a refill is triggered and the pos is set to 0. That's why you get a different array in the RandomState after your first call to randint.

np.random.randint(10)
state = np.random.get_state()
print(state[1].shape)           # (624, ) -> This is now different than before
print(state[2])                 # 1

For your subsequent calls, pos is always smaller than the length of the vector and therefore, only pos is incremented but the vector is not refilled. That only happens if you have requested enough random numbers to exhaust the array in the RandomState.

np.random.randint(10)
state = np.random.get_state()
print(state[1].shape)           # (624, ) -> Now it did not change
print(state[2])                 # 3

Note, however, that the exact increase of pos will depend on the data type of the random numbers you request, so the exact increase of pos and state[2] is not easily predicted (so you cannot expect to have it increase by 1 after every np.int32 you request via randint.

Edit:
I was slightly confused about the non-deterministic increase of pos in the above example. This is caused by the method ensuring that the values are within the correct interval. randint (assuming np.int32 as dtype) internally calls _rand_int32, which in turn calls rk_random_uint32, where rng is a parameter indicating the width of the range of random integers to be drawn. On that basis, a mask is created to only keep the appropriate bits. If now your range is not a power of 2, there are still values (with the last bits being between rng and the next power of 2) which are invalid if they are drawn and which then are discarded. Therefore, depending on the seed different numbers of tries are needed to find a valid number in the correct range. If you instead pick a range that is a power of two, you get the expected increase of one for every drawn random number:

In [1]: import numpy as np

In [2]: print(np.random.get_state()[2])
624

In [3]: for i in range(10):
   ...:     np.random.randint(64, size=100, dtype=np.int32)
   ...:     print(i, np.random.get_state()[2])
   ...:     
0 100
1 200
2 300
3 400
4 500
5 600
6 76
7 176
8 276
9 376

After 624 random numbers, the state vector is used up and you can see pos being reset.

jotasi
  • 5,077
  • 2
  • 29
  • 51