Important Edit:
If you find the time to test the snippets below, please make sure to start a completely fresh session or call np.random.seed(None)
once.
Background:
I have been under the impression that functions such as np.random.randint()
would draw the same set of numbers for identical random states (or whatever you would call the output from np.random.get_state()
).
Let me explain why:
The following snippet uses np.random.randint()
to generate 5 random integers betwenn -10 and 10, and stores some info about the process. What I've named 'state' are the 5 first numbers from the array stored in the second element in the tuple returned by np.random.get_state()
.
Snippet 1
# 1. Imports
import pandas as pd
import numpy as np
# 2. describe random state by
# retrieving the five first numbers
# in the array in the second element
# of the tuple returned by np.random.get_state()
randomState = np.random.get_state()
state = np.random.get_state()[1][:5]
# 3. generate random numbers
randints = np.random.randint(-10, 10, size = 5)
# 4. organize and present findings
df = pd.DataFrame.from_dict({'state':state, 'randints':randints})
print(df)
Run this code once and you'll get results as in the first output section below. Just notice that the numbers themselves will differ from mine since no random seed has been set. What is important is the internal logic of the three sets of output. And if you run the same snippet more that once, you'll notice something that I think is really weird:
Output 1: some random numbers and a random state:
randints state
0 -10 2871458436
1 7 4226334938
2 1 179611462
3 -9 3145869243
4 5 317931933
So far, so good! We have 5 random integers and 5 numbers representing the random state. Run the same snippet again, and you'll get something like this:
Output 2: new random numbers and a new random state:
randints state
0 1 727254058
1 7 1473793264
2 4 2934556005
3 1 721863250
4 -6 3873014002
Now you seemingly have a new random state and 5 new random numbers. So seemingly, my assumption still holds. But every time I've tried this, things are getting weird when you run the same code a third time. Just look at this:
Output 3: new random numbers and a the same random state as before:
randints state
0 8 727254058
1 -4 1473793264
2 -1 2934556005
3 -10 721863250
4 -1 3873014002
As you can see, my assumption was clearly wrong. What is really going on here?
Summary:
- Why does
np.random.randint()
return different integers for the same random state? - Why does running this snippet yield different random states for the first and secon run, but then return the same random state for the second and third run?
Thank you for any suggestions!
My system:
- Python 3.6.0
- IPython 5.1.0
- Numpy 1.11.3
- Spyder 3.2.7
- Windows 64
Appendix:
You'll get the same result if you wrap the same procedure into a function and run it more than two times.
Snippet 2 - Same as Snippet 1 wrapped in a function
def rnumbers(numbers, runs):
df_out = pd.DataFrame()
runs = np.arange(runs)
for r in runs:
print(r)
state = np.random.get_state()[1][:numbers]
# 4. generate random numbers
randints = np.random.randint(-10, 10, size = numbers)
# 5. organize and present findings
df_temp = pd.DataFrame.from_dict({'state_'+str(r+1):state, 'randints_'+str(r+1):randints})
df_out = pd.concat([df_out, df_temp], axis = 1)
return df_out
df = rnumbers(10,3)
print(df)
Output:
randints_1 state_1 randints_2 state_2 randints_3 state_3
0 4 3582151794 -5 1773875493 7 1773875493
1 -7 2910116392 -8 2402690106 3 2402690106
2 -8 3435011439 3 1330293688 4 1330293688
3 1 486242985 4 847834894 2 847834894
4 -3 4214584559 4 4209159694 -2 4209159694
5 4 752109368 -3 2673278965 1 2673278965
6 -10 3726578976 8 2475058425 4 2475058425
7 8 1510778984 -5 3758042425 0 3758042425
8 -2 4202558983 -5 2381317628 0 2381317628
9 4 1514856120 6 3177587154 -7 3177587154