108

I am waiting for another developer to finish a piece of code that will return an np array of shape (100,2000) with values of either -1,0, or 1.

In the meantime, I want to randomly create an array of the same characteristics so I can get a head start on my development and testing. The thing is that I want this randomly created array to be the same each time, so that I'm not testing against an array that keeps changing its value each time I re-run my process.

I can create my array like this, but is there a way to create it so that it's the same each time. I can pickle the object and unpickle it, but wondering if there's another way.

r = np.random.randint(3, size=(100, 2000)) - 1
user2357112
  • 260,549
  • 28
  • 431
  • 505
Idr
  • 6,000
  • 6
  • 34
  • 49

6 Answers6

214

Create your own instance of numpy.random.RandomState() with your chosen seed. Do not use numpy.random.seed() except to work around inflexible libraries that do not let you pass around your own RandomState instance.

[~]
|1> from numpy.random import RandomState

[~]
|2> prng = RandomState(1234567890)

[~]
|3> prng.randint(-1, 2, size=10)
array([ 1,  1, -1,  0,  0, -1,  1,  0, -1, -1])

[~]
|4> prng2 = RandomState(1234567890)

[~]
|5> prng2.randint(-1, 2, size=10)
array([ 1,  1, -1,  0,  0, -1,  1,  0, -1, -1])
Robert Kern
  • 13,118
  • 3
  • 35
  • 32
  • 9
    Do you have any rationale for your recommendation? What's wrong with `numpy.random.seed()`? I know it's not thread-safe, but it's really convenient if you don't need thread-safety. – Sven Marnach Apr 30 '11 at 19:54
  • 59
    It's mostly to form good habits. You may not need independent streams now, but Sven-6-months-from-now might. If you write your libraries to use the methods directly from `numpy.random`, you cannot make independent streams later. It's also easier to write libraries with the intention of having controlled PRNG streams. There are always multiple ways to enter your library, and each of them should have a way to control the seed. Passing around PRNG objects is a cleaner way of doing that than relying on `numpy.random.seed()`. Unfortunately, this comment box is too short to contain more examples.:-) – Robert Kern May 02 '11 at 19:03
  • Thanks for the reply. You are raising good points, and thinking about it a bit, I agree that this is cleaner. I still think for the OP's testing purposes `numpy.random.seed()` should be fine, but I'll edit `numpy.random.seed()` out of my own library code :) – Sven Marnach May 02 '11 at 21:28
  • 27
    Another way of describing Robert's rationale: using numpy.random.seed uses a global variable to keep the PRNG state, and the same standard reasons that global variables are bad apply here. – Robie Basak Mar 01 '12 at 11:19
  • If in the contrary sequences must be independent, I want to seed the random generator with time.time(). However, syntax RandomState(1234567890) is not working (?) – kiriloff Mar 02 '12 at 08:05
  • 9
    If you want the PRNGs to be independent, do not seed them with anything. Just use `numpy.random.RandomState()` with no arguments. This will seed the state with unique values drawn from your operating system facilities for such things (`/dev/urandom` on UNIX machines and the Windows equivalent there). If `numpy.random.RandomState(1234567890)` is not working for you, please show exactly what you typed and exactly the error message that you got. – Robert Kern Mar 03 '12 at 12:57
  • @RobertKern: what do you think of seeding using int(time.time())? – Alex Oct 22 '14 at 16:10
  • 5
    Not a good idea. Use `numpy.random.RandomState()` with no arguments for the best results. – Robert Kern Oct 24 '14 at 11:00
  • 1
    This answer gave me much more than I was looking for when I was searching for the solution to the same problem the poster explained. Nice way to get rid of the implicit global variable was is the rng used when obtaining random numbers directly from numpy.random. – zeycus Aug 29 '16 at 09:19
  • 2
    Another reason to use your own instance of `RandomState` from the beginning is that it will help to create reproducible unit tests. As I found out the hard way, if you `import numpy` into unittest and do `numpy.random.seed`, it *won't* change the seed in the module you're testing! – Legendre17 Jul 26 '18 at 22:51
  • I think the comment from @RobieBasak should be in the accepted answer for people reading this and looking for a simple explanation of the pros and cons of random.seed vs random.RandomState. – DoubleOZ Dec 14 '19 at 21:03
  • 3
    PRNG = PseudoRandom Number Generator (for those wondering what those letters stand for, see Wikipedia for more informations.) – Sarye Haddadi Jan 01 '21 at 16:01
99

Simply seed the random number generator with a fixed value, e.g.

numpy.random.seed(42)

This way, you'll always get the same random number sequence.

This function will seed the global default random number generator, and any call to a function in numpy.random will use and alter its state. This is fine for many simple use cases, but it's a form of global state with all the problems global state brings. For a cleaner solution, see Robert Kern's answer below.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • 52
    Someone snuck in the `numpy.random.seed()` function when I wasn't paying attention. :-) I intentionally left it out of the original module. I recommend that people use their own instances of `RandomState` and passing those objects around. – Robert Kern Apr 29 '11 at 21:01
  • 6
    Robert is a major contributor to numpy. I think we should give his opinion some weight. – deprecated May 01 '11 at 00:27
  • 13
    @deprecated: I'm thankful for Robert's work, but his work isn't a substitute for giving a rationale for the recommendation. Furthermore, if the use of `numpy.random.seed()` is discouraged, this should be mentioned in [the documentation](http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.seed.html#numpy.random.seed). Apparently, other contributors to NumPy don't share Robert's opinion. No offense intended at all, I'm just curious. – Sven Marnach May 01 '11 at 11:11
  • 15
    This is the same as using `random.seed` vs. using a `random.Random` object in the Python standard library. If you use `random.seed` or `numpy.random.seed`, you are seeding *all* random instances, both in your code and in any code that you are calling or any code that is run in the same session as yours. If those things depend on those things being actually random, then you start to run into problems. If you deploy code that sets the random seed, you can introduce a security vulnerability. – asmeurer Mar 24 '14 at 18:33
  • 5
    @asmeurer Anyone using a pseudorandom number generator for security purposes probably doesn't know what they're doing. – JAB Apr 28 '16 at 18:30
  • 1
    @JAB They may or they may not. In the world of Evidence-Based Elections and Risk-Limiting Audits, we most certainly depend on pseudorandom generators for security purposes. We need verifiably unpredictable sampling of ballots, so we can select and check paper ballots against the electronic records, or unpredictable numbers for simulations, etc. If some debugging code makes things predictable, that would be a big deal. Though, since Mersenne Twister isn't a good choice, we roll our own (dice for seeds and cryptographically-secure pseudorandom sequences), and this sort of thing wouldn't bite us. – nealmcb Nov 12 '18 at 22:24
  • @nealmcb Most random number generators used in cryptography are pseudo-random, and there is nothing wrong with that. – Sven Marnach Nov 13 '18 at 08:55
  • 2
    @JAB a secure yet resource-efficient approach is to use true random for the seed and pseudorandom for the next sequences – Muhammad Nizami Dec 20 '18 at 04:00
  • This answer nowadays raises `TypeError: 'int' object is not callable` – johnnyheineken Jul 29 '19 at 10:33
  • I need to use this function np.random.normal() and I need a way to produce same values always. is there a way – ABHISHEK D Nov 25 '20 at 09:02
  • @ABHISHEKD Yes, both this answer and the next one will work for that use case. Note that `RandomState` objects have a `normal` method that you can use. – Sven Marnach Nov 25 '20 at 19:59
8

I just want to clarify something in regard to @Robert Kern answer just in case that is not clear. Even if you do use the RandomState you would have to initialize it every time you call a numpy random method like in Robert's example otherwise you'll get the following results.

Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> prng = np.random.RandomState(2019)
>>> prng.randint(-1, 2, size=10)
array([-1,  1,  0, -1,  1,  1, -1,  0, -1,  1])
>>> prng.randint(-1, 2, size=10)
array([-1, -1, -1,  0, -1, -1,  1,  0, -1, -1])
>>> prng.randint(-1, 2, size=10)
array([ 0, -1, -1,  0,  1,  1, -1,  1, -1,  1])
>>> prng.randint(-1, 2, size=10)
array([ 1,  1,  0,  0,  0, -1,  1,  1,  0, -1])
Kirk Walla
  • 301
  • 4
  • 7
4

If you are using other functions relying on a random state, you can't just set and overall seed, but should instead create a function to generate your random list of number and set the seed as a parameter of the function. This will not disturb any other random generators in the code:

# Random states
def get_states(random_state, low, high, size):
    rs = np.random.RandomState(random_state)
    states = rs.randint(low=low, high=high, size=size)
    return states

# Call function
states = get_states(random_state=42, low=2, high=28347, size=25)
mari756h
  • 41
  • 1
4

Based on the latest updates in Random sampling the preferred way is to use Generators instead of RandomState. Refer to What's new or different to compare both approaches. One of the key changes is the difference between the slow Mersenne Twister pseudo-random number generator (RandomState) and a stream of random bits based on different algorithms (BitGenerators) used in the new approach (Generators).

Otherwise, the steps for producing random numpy array is very similar:

  1. Initialize random generator

Instead of RandomState you will initialize random generator. default_rng is the recommended constructor for the random Generator, but you can ofc try another ways.

import numpy as np

rng = np.random.default_rng(42)
# rng -> Generator(PCG64)
  1. Generate numpy array

Instead of randint method, there is Generator.integers method which is now the canonical way to generate integer random numbers from a discrete uniform distribution (see already mentioned What's new or different summary). Note, that endpoint=True uses [low, high] interval for sampling instead of the default [low, high).

arr = rng.integers(-1, 1, size=10, endpoint=True)
# array([-1,  1,  0,  0,  0,  1, -1,  1, -1, -1])

As already discussed, you have to initialize random generator (or random state) every time to generate identical array. Therefore, the simplest thing is to define custom function similar to the one from @mari756h answer:

def get_array(low, high, size, random_state=42, endpoint=True):
    rng = np.random.default_rng(random_state)
    return rng.integers(low, high, size=size, endpoint=endpoint)

When you call the function with the same parameters you will always get the identical numpy array.

get_array(-1, 1, 10)
# array([-1,  1,  0,  0,  0,  1, -1,  1, -1, -1])

get_array(-1, 1, 10, random_state=12345)  # change random state to get different array
# array([ 1, -1,  1, -1, -1,  1,  0,  1,  1,  0])

get_array(-1, 1, (2, 2), endpoint=False)
# array([[-1,  0],
#        [ 0, -1]])

And for your needs you would use get_array(-1, 1, size=(100, 2000)).

Nerxis
  • 3,452
  • 2
  • 23
  • 39
3

It is important to understand what is the seed of a random generator and when/how it is set in your code (check e.g. here for a nice explanation of the mathematical meaning of the seed).

For that you need to set the seed by doing:

random_state = np.random.RandomState(seed=your_favorite_seed_value)

It is then important to generate the random numbers from random_state and not from np.random. I.e. you should do:

random_state.randint(...)

instead of

np.random.randint(...) 

which will create a new instance of RandomState() and basically use your computer internal clock to set the seed.

t_sic
  • 79
  • 1