2

I want to replicate MATLAB's randperm() with NumPy.

Currently, to get randperm(n, k) I use np.random.permutation(n)[:k]. The problem is it allocates an array of size n then takes only k entries of it.

Is there a more memory efficient way to directly create the array?

Royi
  • 4,640
  • 6
  • 46
  • 64
  • Does MATLAB do something different? – hpaulj Aug 07 '22 at 11:06
  • 1
    Does this answer your question? [Comparison of np.random.choice vs np.random.shuffle for samples without replacement](https://stackoverflow.com/questions/65218695/comparison-of-np-random-choice-vs-np-random-shuffle-for-samples-without-replacem) – Peter O. Aug 07 '22 at 11:23
  • What are your typical n,k values? If n is huge and k is small, you could get away with some custom implementation. – bzu Aug 07 '22 at 12:08

3 Answers3

2

I can recommend you np.random.choice(n, k, replace = False). Yet, I am not sure about memory efficiency. Please refer to docs

Royi
  • 4,640
  • 6
  • 46
  • 64
TaQ
  • 162
  • 1
  • 7
  • 1
    I added `replace = False` which is a must to replicate MATLAB's behavior. Thanks for the answer. – Royi Aug 07 '22 at 12:27
2

numpy.random.choice(n, k, replace=False) is no more memory efficient than numpy.random.permutation(n)[:k]. It too creates an n-item temporary list, shuffles that list, and takes k items from that list. See:

However, numpy.random.* functions, such as numpy.random.choice and numpy.random.permutation, have become legacy functions as of NumPy 1.17, and their algorithms — inefficiencies and all — are expected to remain as they are for backward compatibility reasons (see the recent RNG policy for NumPy).

Fortunately, NumPy since version 1.17 has an alternative:numpy.random.Generator.choice, which uses a much more efficient implementation, as can be seen below:

In [227]: timeit np.random.choice(4000000, 48, replace = False)                                  
163 ms ± 19.3 ms per loop (mean ± std. Dev. Of 7 runs, 1 loop each)

In [228]: timeit np.random.permutation(4000000)[:48]                                             
178 ms ± 22.5 ms per loop (mean ± std. Dev. Of 7 runs, 1 loop each)

In [229]: r=numpy.random.default_rng()                                                           

In [230]: timeit r.choice(4000000,48,replace=False)                                              
14.5 µs ± 28.9 ns per loop (mean ± std. Dev. Of 7 runs, 100000 loops each)

If you use NumPy 1.17 or later, you should make use of the new pseudorandom number generation system introduced in version 1.17, including numpy.random.Generator, in newer applications.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
0

Based on @TaQ answer:

np.random.choice(n, k, replace = False)

Is the equivalent to MATLAB's randperm().

Update: I will update his answer as well to mark it.

Royi
  • 4,640
  • 6
  • 46
  • 64