0

I want to generate k unique values with numpy.random, drawn from a uniform distribution between 0 and N (excluding N), where k << N.

At first glance it looks like numpy.random.choice is the right approach: np.random.choice(N,k,replace=False) and that works in theory, but there's a gotcha from the docs:

Parameters

a : 1-D array-like or int If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if a were np.arange(a)

so if N is large, it effectively calls np.arange(N) which creates an unnecessary array and can get very slow.

Is there a way to create a "virtual" large array for numpy to use? I can't figure out if it is straightforward to do so based on what "array-like" means.

Alternatively is there another way to do this using numpy.random but without using np.random.choice?

The "obvious" Python duck-typing approach below works correctly but is even slower (perhaps numpy tries to create a copy first?)

class VirtualArray(object):
    def __init__(self, N):
        self.N = N
    def __len__(self):
        return self.N
    def __getitem__(self, k):
        if 0 <= k < self.N:
            return k
        raise IndexError('index out of range: %s' % k)

N = 1000000
np.random.seed(123)    
a = np.random.choice(VirtualArray(N),20,replace=False)
np.random.seed(123)    
b = np.random.choice(N,20,replace=False)
print (a)
print (b)
Jason S
  • 184,598
  • 164
  • 608
  • 970
  • *unique* is the key element. (random samples without replacement) – Jason S Nov 05 '20 at 20:37
  • 1
    If $k << N$ and $N$ is very large, then perhaps using a set $S$ and a while loop might be faster: generate a single random number at a time, while the length of the set $S$ is smaller than $k$, check to see if the generated number is in the set already, if not, add it to the set. It's not very pythonic, unfortunately. – Math Helper Nov 05 '20 at 20:45
  • never mind, https://stackoverflow.com/questions/8505651/non-repetitive-random-number-in-numpy mentions use of `random.sample(range(N),k)` – Jason S Nov 05 '20 at 20:47

1 Answers1

0

aha, random.sample seems to work as per Non-repetitive random number in numpy

import random
%timeit x=random.sample(range(1000*1000*1000*1000),10000)

which prints on my Jupyter console:

6.51 ms ± 187 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Jason S
  • 184,598
  • 164
  • 608
  • 970