As titled, it seems very didactic to set random_state
for every randomness-related pandas function. Any way to set it only once to make sure the random state is set for all functions?
Asked
Active
Viewed 7,631 times
9

Mr.cysl
- 1,494
- 6
- 23
- 37
-
1https://stackoverflow.com/questions/11526975/set-random-seed-programwide-in-python – BENY Sep 17 '18 at 20:39
-
This arg is optional, no? – Oliver Charlesworth Sep 17 '18 at 20:40
-
@Wen Does this work with pandas? – Mr.cysl Sep 17 '18 at 20:40
-
@OliverCharlesworth Yes it is. But I am trying to make sure I could reproduce what I am doing, so I need to set random_state for every (applicable) function. – Mr.cysl Sep 17 '18 at 20:42
1 Answers
15
Pandas functions get their random source by calling pd.core.common._random_state, which accepts a single state
argument, defaulting to None. From its docs:
Parameters
----------
state : int, np.random.RandomState, None.
If receives an int, passes to np.random.RandomState() as seed.
If receives an np.random.RandomState object, just returns object.
If receives `None`, returns np.random.
If receives anything else, raises an informative ValueError.
Default None.
So if it gets None, which is the default value for the caller's random_state, it returns the np.random
module itself:
In [247]: pd.core.common._random_state(None)
Out[247]: <module 'numpy.random' from 'C:\\Python\\lib\\site-packages\\numpy\\random\\__init__.py'>
and it will use the global numpy state. So:
In [262]: np.random.seed(3)
In [263]: pd.Series(range(10)).sample(3).tolist()
Out[263]: [5, 4, 1]
In [264]: pd.DataFrame({0: range(10)}).sample(3)[0].tolist()
Out[264]: [3, 8, 2]
In [265]: np.random.seed(3)
In [266]: pd.Series(range(10)).sample(3).tolist()
Out[266]: [5, 4, 1]
In [267]: pd.DataFrame({0: range(10)}).sample(3)[0].tolist()
Out[267]: [3, 8, 2]
If any method doesn't respect this, it's a bug.

DSM
- 342,061
- 65
- 592
- 494
-
1So whenever I set numpy's random seed and do not pass any sort of random_state to pandas operations, my code will still be deterministic based on np.random.seed. Is that right? – Mr.cysl Sep 17 '18 at 20:50
-
1
-
Thanks!! Also, is there a connection between `np.random.seed` and `random.seed`? – Mr.cysl Sep 17 '18 at 20:53
-
2