134

How can I generate non-repetitive random numbers in numpy?

list = np.random.random_integers(20,size=(10))
Gulzar
  • 23,452
  • 27
  • 113
  • 201
Academia
  • 3,984
  • 6
  • 32
  • 49
  • What do you mean by "non-repetitive"? That the sequence of random numbers never recurs? This is not possible, since the state of the random number generator needs to fit in the finite memory of a computer. Or do you mean that no single number occurs twice? – Sven Marnach Dec 14 '11 at 13:57
  • 9
    Non-repetitive means that you have a list with no duplicates. – Polynomial Dec 14 '11 at 13:58
  • 2
    Perhaps you need a random permutation? http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.permutation.html – cyborg Dec 14 '11 at 14:10

6 Answers6

174

numpy.random.Generator.choice offers a replace argument to sample without replacement:

from numpy.random import default_rng

rng = default_rng()
numbers = rng.choice(20, size=10, replace=False)

If you're on a pre-1.17 NumPy, without the Generator API, you can use random.sample() from the standard library:

print(random.sample(range(20), 10))

You can also use numpy.random.shuffle() and slicing, but this will be less efficient:

a = numpy.arange(20)
numpy.random.shuffle(a)
print a[:10]

There's also a replace argument in the legacy numpy.random.choice function, but this argument was implemented inefficiently and then left inefficient due to random number stream stability guarantees, so its use isn't recommended. (It basically does the shuffle-and-slice thing internally.)

Some timings:

import timeit
print("when output size/k is large, np.random.default_rng().choice() is far far quicker, even when including time taken to create np.random.default_rng()")
print(1, timeit.timeit("rng.choice(a=10**5, size=10**4, replace=False, shuffle=False)", setup="import numpy as np; rng=np.random.default_rng()", number=10**3)) #0.16003450006246567
print(2, timeit.timeit("np.random.default_rng().choice(a=10**5, size=10**4, replace=False, shuffle=False)", setup="import numpy as np", number=10**3)) #0.19915290002245456

print(3, timeit.timeit("random.sample( population=range(10**5), k=10**4)", setup="import random", number=10**3))   #5.115292700007558

print("when output size/k is very small, random.sample() is quicker")
print(4, timeit.timeit("rng.choice(a=10**5, size=10**1, replace=False, shuffle=False)", setup="import numpy as np; rng=np.random.default_rng()", number=10**3))  #0.01609779999125749
print(5, timeit.timeit("random.sample( population=range(10**5), k=10**1)", setup="import random", number=10**3))  #0.008387799956835806

So numpy.random.Generator.choice is what you usually want to go for, except for very small output size/k.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • 1
    What if my n is not 20, but like 1000000, but I need only 10 unique numbers from it, is there more memory efficient approach? – mrgloom Sep 08 '18 at 18:24
  • 2
    @mrgloom In Python 3, `random.sample(range(n), 10))` will be efficient even for very large `n`, since a `range` object is just a small wrapper storing start, stop and step values, but does not create the full list of integers. In Python 2, you can replace `range` with `xrange` to get a similar behaviour. – Sven Marnach Sep 09 '18 at 18:35
144

I think numpy.random.sample doesn't work right, now. This is my way:

import numpy as np
np.random.choice(range(20), 10, replace=False)
xav
  • 5,452
  • 7
  • 48
  • 57
strnam
  • 1,616
  • 1
  • 10
  • 6
  • 31
    Instead of `range(n)` (or `arange(n)`) as the first argument of `choice`, it is equivalent to just pass `n`, e.g. `choice(20, 10, replace=False)`. – Josh Bode Oct 04 '14 at 05:48
  • 2
    Note that `np.random.choice(a, size, replace=False)` is very slow for large `a` - on my machine, around 30 ms for a=1M. – Matthew Rahtz Jul 14 '19 at 20:13
  • 5
    To avoid time and memory issues for very large `n` use `numpy.random.Generator.choice` (starting with numpy v1.17) – benbo Sep 27 '19 at 18:40
  • 1
    The main disadvantage I see is np.random.choice does not have an axis parameter -> it's only for 1d arrays. – ssp Mar 16 '20 at 03:05
  • @ssp Then just get enough random numbers and reshape... – user654123 Aug 14 '22 at 10:59
8

Years later, some timeits for choosing 40000 out of 10000^2 (Numpy 1.8.1, imac 2.7 GHz):

import random
import numpy as np

n = 10000
k = 4
np.random.seed( 0 )

%timeit np.random.choice( n**2, k * n, replace=True )  # 536 µs ± 1.58 µs
%timeit np.random.choice( n**2, k * n, replace=False ) # 6.1 s ± 9.91 ms

# https://docs.scipy.org/doc/numpy/reference/random/index.html
randomstate = np.random.default_rng( 0 )
%timeit randomstate.choice( n**2, k * n, replace=False, shuffle=False )  # 766 µs ± 2.18 µs
%timeit randomstate.choice( n**2, k * n, replace=False, shuffle=True )   # 1.05 ms ± 1.41 µs

%timeit random.sample( range( n**2 ), k * n )          # 47.3 ms ± 134 µs

(Why choose 40000 out of 10000^2 ? To generate large scipy.sparse.random matrices -- scipy 1.4.1 uses np.random.choice( replace=False ), slooooow.)

Tip of the hat to numpy.random people.

denis
  • 21,378
  • 10
  • 65
  • 88
3

You can get this by sorting as well:

random_numbers = np.random.random([num_samples, max_int])
samples = np.argsort(random_numbers, axis=1)
Ben
  • 953
  • 8
  • 20
0

Python set-list conversion can be used. 10 random non repetitive numbers between 0 and 20 can be obtained as:

import random
numbers=set()
while(len(numbers)<10):
    numbers.add(random.randint(0,20))

numbers=list(numbers)
random.shuffle(numbers)
print(numbers)
Ali Şentürk
  • 321
  • 3
  • 6
-4

Simply generate an array that contains the required range of numbers, then shuffle them by repeatedly swapping a random one with the 0th element in the array. This produces a random sequence that doesn't contain duplicate values.

Polynomial
  • 27,674
  • 12
  • 80
  • 107
  • 2
    Another property of the resulting random sequence is that [it is not particularly random](http://www.codinghorror.com/blog/2007/12/the-danger-of-naivete.html). – Sven Marnach Dec 14 '11 at 14:04
  • @SvenMarnach - For most purposes, though, it's random enough. He could use the double-random approach if he wanted it more random. – Polynomial Dec 14 '11 at 14:08
  • 1
    This is pointless. The OP can use library calls to do it right. They are easier to use, run faster and are more readable than a custom version. I can't think of any reason why I should use a wrong algorithm here just because it is probably "random enough", when using the right algorithm has no disadvantage whatsoever. – Sven Marnach Dec 14 '11 at 14:12
  • @SvenMarnach - Fair enough. I don't know numpy, so I was just offering a potential solution. – Polynomial Dec 14 '11 at 14:13