9

As titled, it seems very didactic to set random_state for every randomness-related pandas function. Any way to set it only once to make sure the random state is set for all functions?

Mr.cysl
  • 1,494
  • 6
  • 23
  • 37

1 Answers1

15

Pandas functions get their random source by calling pd.core.common._random_state, which accepts a single state argument, defaulting to None. From its docs:

Parameters
----------
state : int, np.random.RandomState, None.
    If receives an int, passes to np.random.RandomState() as seed.
    If receives an np.random.RandomState object, just returns object.
    If receives `None`, returns np.random.
    If receives anything else, raises an informative ValueError.
    Default None.

So if it gets None, which is the default value for the caller's random_state, it returns the np.random module itself:

In [247]: pd.core.common._random_state(None)
Out[247]: <module 'numpy.random' from 'C:\\Python\\lib\\site-packages\\numpy\\random\\__init__.py'>

and it will use the global numpy state. So:

In [262]: np.random.seed(3)

In [263]: pd.Series(range(10)).sample(3).tolist()
Out[263]: [5, 4, 1]

In [264]: pd.DataFrame({0: range(10)}).sample(3)[0].tolist()
Out[264]: [3, 8, 2]

In [265]: np.random.seed(3)

In [266]: pd.Series(range(10)).sample(3).tolist()
Out[266]: [5, 4, 1]

In [267]: pd.DataFrame({0: range(10)}).sample(3)[0].tolist()
Out[267]: [3, 8, 2]

If any method doesn't respect this, it's a bug.

DSM
  • 342,061
  • 65
  • 592
  • 494