Numpy choose without replacement along specific dimension

Question

Without replacement I'm choosing k elements from a sample n distinct times according to a specified distribution.

The iterative solution is simple:

for _ in range(n):
    np.random.choice(a, size=k, replace=False, p=p)

I can't set size=(k, n) because I would sample without replacement across samples. a and n are large, I hope for a vectorized solution.

Consider `np.random.choice(np.arange(5), size=(3, 5), replace=False)`. This gives the error 'Cannot take a larger sample than population when replace=False'. What I want is to choose 3 values from `range(5)`, 5 times, each time without replacement. — Eric Kaschalk, Oct 12 '16 at 15:55
how large are `a` and `n`? are you sure this is not a case of premature optimization? — Aaron, Oct 12 '16 at 16:02
If you have large arrays, a loop may in fact be better in order to save on memory usage — Aaron, Oct 12 '16 at 16:13
I should have specified that in the end I will operate on the arrays all at once - so I either will join the choices into a single `(k, n)` array or generate it all at once. I asked this question to see if the latter is possible. — Eric Kaschalk, Oct 12 '16 at 16:24
You can preallocate the final result and store each iteration in successive columns. — Mad Physicist, Oct 12 '16 at 16:27
@Aaron, he just gave you an example of a and n values 2 comments above your first one. — Hedwin Bonnavaud, Nov 21 '21 at 09:56
@HedwinBonnavaud this was 5 years ago... evidently I commented before fully reading the comments, but like.... clearly it no longer matters. — Aaron, Nov 21 '21 at 18:43

hpaulj · Answer 1 · 2016-10-12T19:04:43.790

So the full iterative solution is:

In [158]: ll=[]
In [159]: for _ in range(10):
     ...:     ll.append(np.random.choice(5,3)) 
In [160]: ll
Out[160]: 
[array([3, 2, 4]),
 array([1, 1, 3]),
 array([0, 3, 1]),
 ...
 array([0, 3, 0])]
In [161]: np.array(ll)
Out[161]: 
array([[3, 2, 4],
       [1, 1, 3],
       ...
       [3, 0, 1],
       [4, 4, 2],
       [0, 3, 0]])

That could be cast as list comprehension: np.array([np.random.choice(5,3) for _ in range(10)]).

Or an equivalent where you A=np.zeros((10,3),int) and A[i,:]=np.random...

In other words you want choices from range(5), but want them to be unique only within rows.

The np.random.choice docs suggest an alternative:

>>> np.random.choice(5, 3, replace=False)
array([3,1,0])
>>> #This is equivalent to np.random.permutation(np.arange(5))[:3]

I'm wondering if I can generate

array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       ...
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])

and permute values within rows. But with permute I can only shuffle all the columns together. So I'm still stuck with iterating on rows to produce the choice without replacement.

Nevermind, found [this](https://stackoverflow.com/questions/51279464/sampling-unique-column-indexes-for-each-row-of-a-numpy-array) which you pointed out — Josmoor98, Jan 29 '20 at 15:07
"np.random.choice(5, 3, replace=False) is equivalent to np.random.permutation(np.arange(5))[:3]", is it also equivalent in terms of computation time ? — Hedwin Bonnavaud, Nov 21 '21 at 10:00

score 1 · Answer 2 · answered Oct 12 '16 at 16:37

Here are a couple of suggestions.

You can preallocate the (n, k) output array, then do the choice multiple times:

result = np.zeros((n, k), dtype=a.dtype)
for row in range(n):
    result[row, :] = np.random.choice(a, size=k, replace=False, p=p)

You can precompute the n * k selection indices and then apply them to a all at once. Since you want to sample the indices without replacement, you will want to use np.choice in a loop again:
```
indices = np.concatenate([np.random.choice(a.size, size=k, replace=False, p=p) for _ in range(n)])
result = a[indices].reshape(n, k)
```

Numpy choose without replacement along specific dimension

2 Answers2

Linked