52

I am trying to generate scipy.stats.pareto.rvs(b, loc=0, scale=1, size=1) with different seed.

In numpy we can seed using numpy.random.seed(seed=233423).

Is there any way to seed the random number generated by scipy stats.

Note: I am not using numpy pareto because I want to give different values for scale.

ashok
  • 625
  • 1
  • 5
  • 5
  • 2
    For the more convenient and robust modern world of the Generator class, see @Abhinav's answer – nealmcb Nov 08 '20 at 22:52

4 Answers4

58

scipy.stats just uses numpy.random to generate its random numbers, so numpy.random.seed() will work here as well. E.g.,

import numpy as np
from scipy.stats import pareto
b = 0.9
np.random.seed(seed=233423)
print pareto.rvs(b, loc=0, scale=1, size=5)
np.random.seed(seed=233423)
print pareto.rvs(b, loc=0, scale=1, size=5)

will print [ 9.7758784 10.78405752 4.19704602 1.19256849 1.02750628] twice.

Kelsey
  • 992
  • 7
  • 7
34

For those who are stumbling upon this question 7 years later, there has been a major change in numpy Random State generator function. As per the documentation here and here, the RandomState class is replaced with the Generator class. RandomState is guaranteed to be compatible with older versions/codes however it will not receive any substantial changes, including algorithmic improvements, which are reserved for Generator.

For clarifications on how to pass an existing Numpy based random stream to Scipy functions in the same experiment, given below are some examples and reasonings for which cases are desirable and why.

from numpy.random import Generator, PCG64
from scipy.stats import binom

n, p, size, seed = 10, 0.5, 10, 12345

# Case 1 : Scipy uses some default Random Generator
numpy_randomGen = Generator(PCG64(seed))
scipy_randomGen = binom
print(scipy_randomGen.rvs(n, p, size))
print(numpy_randomGen.binomial(n, p, size))
# prints
# [6 6 5 4 6 6 8 6 6 4]
# [4 4 6 6 5 4 5 4 6 7]
# NOT DESIRABLE as we don't have control over the seed of Scipy random number generation


# Case 2 : Scipy uses same seed and Random generator (new object though)
scipy_randomGen.random_state=Generator(PCG64(seed))
numpy_randomGen = Generator(PCG64(seed))
print(scipy_randomGen.rvs(n, p, size))
print(numpy_randomGen.binomial(n, p, size))
# prints
# [4 4 6 6 5 4 5 4 6 7]
# [4 4 6 6 5 4 5 4 6 7]
    # This experiment is using same sequence of random numbers, one is being used by Scipy
# and other by Numpy. NOT DESIRABLE as we don't want repetition of some random 
# stream in same experiment.


# Case 3 (IMP) : Scipy uses an existing Random Generator which can being passed to Scipy based 
# random generator object
numpy_randomGen = Generator(PCG64(seed))
scipy_randomGen.random_state=numpy_randomGen
print(scipy_randomGen.rvs(n, p, size))
print(numpy_randomGen.binomial(n, p, size))
# prints
# [4 4 6 6 5 4 5 4 6 7]
# [4 8 6 3 5 7 6 4 6 4]
# This should be the case which we mostly want (DESIRABLE). If we are using both Numpy based and 
#Scipy based random number generators/function, then not only do we have no repetition of 
#random number sequences but also have reproducibility of results in this case.
Abhinav
  • 1,882
  • 1
  • 18
  • 34
  • 2
    Very helpful! This is the new normal, best answer :) – nealmcb Nov 08 '20 at 22:32
  • For case 3, does this change the random state for all uses of binom, even by other modules? Does that mean that changes in usage of binom in other modules will spoil the reproducability of its use here? Would it be better to explicitly pass the numpy_randomGen to a local instance of a distribution, combining your approach with the answer by Scotty1-? – nealmcb Nov 12 '20 at 16:36
  • If I understood your question correctly, I think the state of all uses of Scipy.binom within the same scope/module will change or where ever we pass the scipy_randomgen object explicitly. – Abhinav Nov 12 '20 at 18:40
  • 1
    I would think it would reach beyond the current module and scope. Wouldn't any code that imports binom pick up the changed state, including third-party code that you have no control over? I.e. passing the scipy_randomgen object to binom also transfers it to other code that imports binom. But I haven't tested that.... – nealmcb Nov 13 '20 at 19:20
21

For those who happen upon this post four years later, Scipy DOES provide a way to pass a np.random.RandomState object to its random variable classes, see rv_continuous and rv_discrete for more details. The scipy documentation says this:

seed : None or int or numpy.random.RandomState instance, optional

This parameter defines the RandomState object to use for drawing random variates. If None (or np.random), the global np.random state is used. If integer, it is used to seed the local RandomState instance. Default is None.

Unfortunately, it appears this argument is not available after the continuous/discrete rvs subclass rv_continuous or rv_discrete. However, the random_state property does belong to the sublass, meaning we can set the seed using an instance of np.random.RandomState after instantiation like so:

import numpy as np
import scipy.stats as stats

alpha_rv = stats.alpha(3.57)
alpha_rv.random_state = np.random.RandomState(seed=342423)
user5915738
  • 450
  • 5
  • 12
  • 1
    I think this is the better answer, [see](https://stackoverflow.com/questions/59105921/why-is-numpy-random-seed-not-remaining-fixed-but-randomstate-is-when-run-in-para) the problems you can run into when using `numpy.random.seed`, `RandomState` is more robust *imo*... – RK1 Apr 20 '20 at 10:12
9

Adding to the answer of user5915738, which I think is the best answer in general, I'd like to point out the imho most convenient way to seed the random generator of a scipy.stats distribution.

You can set the seed while generating the distribution with the rvs method, either by defining the seed as an integer, which is used to seed np.random.RandomState internally:

uni_int_seed = scipy.stats.uniform(-.1, 1.).rvs(10, random_state=12)

or by directly defining the np.random.RandomState:

uni_state_seed = scipy.stats.uniform(-.1, 1.).rvs(
    10, random_state=np.random.RandomState(seed=12))

Both methods are equivalent:

np.all(uni_int_seed == uni_state_seed)
# Out: True

The advantage of this method over assigning it to the random_state of rv_continuous or rv_discrete is, that you always have explicit control over the random state of your rvs, whereas with my_dist.random_state = np.random.RandomState(seed=342423) the seed is lost after each call to rvs, possibly resulting in non-reproducible results when losing track of distributions.
Also according to the The Zen of Python:

  1. Explicit is better than implicit.

:)

JE_Muc
  • 5,403
  • 2
  • 26
  • 41