19

I would like a function that can generate a pseudo-random sequence of values, but for that sequence to be repeatable every run. The data I want has to be reasonably well randomly distributed over a given range, it doesn't have to be perfect.

I want to write some code which will have performance tests run on it, based on random data. I would like that data to be the same for every test run, on every machine, but I don't want to have to ship the random data with the tests for storage reasons (it might end up being many megabytes).

The library for the random module doesn't appear to say that the same seed will always give the same sequence on any machine.

EDIT: If you're going to suggest I seed the data (as I said above), please provide the documentation that says the approach valid, and will work on a range of machines/implementations.

EDIT: CPython 2.7.1 and PyPy 1.7 on Mac OS X and CPython 2.7.1 and CPython 2.52=.2 Ubuntu appear to give the same results. Still, no docs that stipulate this in black and white.

Any ideas?

Joe
  • 46,419
  • 33
  • 155
  • 245
  • 3
    Have you *tried* generating a sequence with a given seed multiple times? –  Jan 26 '12 at 19:00
  • I only have one computer and one operating system, so I can't reliably test this. – Joe Jan 26 '12 at 19:03
  • As I think the fundamentally question is "for what?" If cipher - it's very bad idea and don't do it. You must write "for what". – theWalker Jan 26 '12 at 19:05
  • 1
    @skippy: please read his question. He clearly says he wants them for performance tests based on random data, which is a perfectly sensible thing to want. – DSM Jan 26 '12 at 19:07
  • Just a quick comment to save others from my rookie mistake: random.seed() works for making random repeatable, but you will only see the same results if your **input data** is also the same. It sounds obvious, but be sure to check it. – Stephen Aug 23 '17 at 20:01

9 Answers9

24

For this purpose, I've used a repeating MD5 hash, since the intention of a hashing function is a cross-platform one-to-one transformation, so it will always be the same on different platforms.

import md5

def repeatable_random(seed):
    hash = seed
    while True:
        hash = md5.md5(hash).digest()
        for c in hash:
            yield ord(c)

def test():
    for i, v in zip(range(100), repeatable_random("SEED_GOES_HERE")):
        print v

Output:

184 207 76 134 103 171 90 41 12 142 167 107 84 89 149 131 142 43 241 211 224 157 47 59 34 233 41 219 73 37 251 194 15 253 75 145 96 80 39 179 249 202 159 83 209 225 250 7 69 218 6 118 30 4 223 205 91 10 122 203 150 202 99 38 192 105 76 100 117 19 25 131 17 60 251 77 246 242 80 163 13 138 36 213 200 135 216 173 92 32 9 122 53 250 80 128 6 139 49 94

Essentially, the code will take your seed (any valid string) and repeatedly hash it, thus generating integers from 0 to 255.

DrRobotNinja
  • 1,381
  • 12
  • 14
12

There are platform differences, so if you move your code between different platforms I would go for the method that DrRobotNinja described.

Please take a look at the following example. Python on my desktop machine (64-bit Ubuntu with a Core i7, Python 2.7.3) gives me the following:

> import random
> r = random.Random()
> r.seed("test")
> r.randint(1,100)
18

But if I run the same code on my Raspberry Pi (Raspbian on ARM11), I get a a different result (for the same version of Python)

> import random
> r = random.Random()
> r.seed("test")
> r.randint(1,100)
34
Joppe
  • 1,465
  • 1
  • 12
  • 17
  • I don't know if this is a documented behavior though. It seems strange that this should be platform dependent, when such a large part of the Python standard library is designed to work cross platform. Maybe I should file a bug with the Python team? – Joppe Oct 05 '13 at 13:13
  • That's a good idea (after first looking through the bug reports). If it's not a bug there will be a good explanation why. – Joe Oct 06 '13 at 17:12
  • I just had the same issue on two copies of Ubuntu. Both the same Linux version, both the same Python version (2.7.3), both the same GCC version. However, one is 32 bit and the other is 64 bit. The 64 bit machine gives the same as your 64 bit version (18) and my 32 bit machine gives the same as your Pi (34). This must be a 32/64 bit thing. Was a bug report ever created? – Tom17 Jan 24 '14 at 19:47
  • No, I unfortunately never filed any bug report. Sorry about that. – Joppe Jan 28 '14 at 10:00
  • 4
    I didn't add a link to my answer to this problem when I wrote it, so here it is: https://stackoverflow.com/a/26592047/1065901 Basically, the problem is that you do not initialize the generator with an integer, but with some other value that has to be hashed first (and that's the platform-dependent part). – causa prima May 30 '17 at 07:26
8

If the quality of the random numbers isn't as critical as the repeatability-across-platforms, you can use one of the traditional linear congruential generators:

class lcg(object):
    def __init__( self, seed=1 ):
        self.state = seed

    def random(self):
        self.state = (self.state * 1103515245 + 12345) & 0x7FFFFFFF
        return self.state

Since this is coded in your program using integer arithmetic, it should be deterministically repeatable across any reasonable platform.

Evgeni Sergeev
  • 22,495
  • 17
  • 107
  • 124
Russell Borogove
  • 18,516
  • 4
  • 43
  • 50
7

Also an answer why the example from this answer does produce different output on different machines:

It is because when seeding the random generator the seed has to be a integer number. If you seed the generator with some non-integer it has to be hashed first. The hash functions themselfes are not platform independent (obviously at least not all of them, correct me if you know more).

So to pull it all together: Python uses a pseudo-random number generator. Therefore, when started from the same state, the produced sequence of random numbers will always be the same, independent of platform. It just a deteministic algorithm without further input from the outside world.

This means: as long as you initialize your random generator with the same state, it will produce the same sequence of numbers. Getting to the same state can be done using the same integer seed or by saving and reapplying the old state (random.getstate() and random.setstate()).

causa prima
  • 1,502
  • 1
  • 14
  • 24
  • 1
    This seems like a genuine, if partial answer, and certainly adds information. If I were you, I'd remove the apology at the top! – Joe Oct 27 '14 at 16:24
  • 1
    It started out much shorter but developed into an elaborated answer, so you are right, and I removed/rephrased it. – causa prima Oct 27 '14 at 20:08
7

Specify a seed to the random number generator. If you provide the same seed, your random numbers should also be the same.

http://docs.python.org/library/random.html#random.seed

Oleksi
  • 12,947
  • 4
  • 56
  • 80
  • 1
    That's what I thought, but as I said in the question, I cannot see any documentation that backs this up. – Joe Jan 26 '12 at 18:58
  • @Oleksi - only on this implementation of Python on this operating system on this machine. My requirements are that it behaves the same over different implementations (for starters, the docs seem to suggest that the random seed is generated in a C module. What about PyPy?) – Joe Jan 26 '12 at 19:05
  • 3
    @Joe It's not defined because that's part of the formal **definition** of a seed. There's no pseudo random algorithm that will give different results with the same seed, that's just impossible. I assume they could mention it, but they probably thought it was obvious to everyone. – Voo Jan 26 '12 at 19:11
  • As far as I know, all pseudo random number generators maintain this property where the same seed will generate the same set of random numbers. It seems to be a property of the underlying algorithms used to generate the primes. – Oleksi Jan 26 '12 at 19:13
  • 3
    I have just as much conjecture as everyone else, I came here because I could not back up my assumptions with facts. Are we guaranteed that the same algorithm will always be used? – Joe Jan 26 '12 at 19:20
6

The documentation does not explicitly say that providing a seed will always guarantee the same results, but that is guaranteed with Python's implementation of random based on the algorithm that is used.

According to the documentation, Python uses the Mersenne Twister as the core generator. Once this algorithm is seeded it does not get any external output which would change subsequent calls, so give it the same seed and you will get the same results.

Of course you can also observe this by setting a seed and generating large lists of random numbers and verifying that they are the same, but I understand not wanting to trust that alone.

I have not checked that other Python implementations besides CPython but I highly doubt they would implement the random module using an entirely different algorithm.

Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • 1
    That's what I thought. I will probably end up doing this as the least-worst solution. – Joe Jan 26 '12 at 19:12
  • Even if they used a completely different algorithm, if you gave the same seed to the same pseudo random algorithm it'll spit out the same sequence of numbers - you would run into problems if you wanted to test on two different implementations of python that used a different algorithm but that's about it. But as I understand the docs they guarantee the underlying algorithm too, so that's all fine. – Voo Jan 26 '12 at 19:16
  • It occurs to me that you still might get different results if you have a 32-bit Mersenne Twister vs. a 64-bit Mersenne Twister – DrRobotNinja Jan 11 '14 at 00:18
5

Using random.seed(...) You can generate a repeatable sequence. A demonstration:

import random

random.seed(321)
list1 = [random.randint(1,10) for x in range(5)]

random.seed(321)
list2 = [random.randint(1,10) for x in range(5)]

assert(list1==list2)

This works because random.seed(...) is not truly random: it's pseudo-random, whereby successive numbers are produced by permuting some state machine, given an initial starting condition, the 'seed'.

Liam M
  • 5,306
  • 4
  • 39
  • 55
  • Use `random.Random` class instead as the above will alter module level `seed` and you can get into trouble if your code calls randint from other places. – Murali KG Jan 25 '19 at 11:50
1

I just tried the following:

import random
random.seed(1)
random.random()
random.random()
random.random()

random.seed(1)
random.random()
random.random()
random.random()

I entered each line at the CLI at various speeds over multiple times. Produced the same values each time.

biscuit314
  • 2,384
  • 2
  • 21
  • 29
0

One option is to use numpy.random that has a goal of being platform agnostic , see also cross platform numpy.random.seed()

iNecas
  • 1,743
  • 1
  • 13
  • 16