Efficient way of sampling from indices of a Numpy array?

Question

I'd like to sample from indices of a 2D Numpy array, considering that each index is weighted by the number inside of that array. The way I know it is with numpy.random.choice however that does not return the index but the number itself. Is there any efficient way of doing so?

Here is my code:

import numpy as np
A=np.arange(1,10).reshape(3,3)
A_flat=A.flatten()
d=np.random.choice(A_flat,size=10,p=A_flat/float(np.sum(A_flat)))
print d

See this: http://stackoverflow.com/a/10803136/553404 with minor modifications — YXD, Nov 04 '13 at 00:26
@MrE, but that means to make an extra array to store the indices, right? — Cupitor, Nov 04 '13 at 00:29
Yes. I'd save the output of `np.indices(A)`, flatten the result(ing tuples) as well as your array of weights, use the linked method and your result is then given by `flattened_indices_x[idx], flattened_indices[idx]`. EDIT: Actually you can probably use http://docs.scipy.org/doc/numpy/reference/generated/numpy.unravel_index.html to avoid creating the index array and get the 2d index straight from `idx` and your weights array shape. — YXD, Nov 04 '13 at 00:36

score 2 · Answer 1 · edited May 23 '17 at 12:33

2

To expand on my comment: Adapting the weighted choice method presented here https://stackoverflow.com/a/10803136/553404

def weighted_choice_indices(weights):
    cs = np.cumsum(weights.flatten())/np.sum(weights)
    idx = np.sum(cs < np.random.rand())
    return np.unravel_index(idx, weights.shape)

edited May 23 '17 at 12:33

Community

1
1

answered Nov 04 '13 at 00:47

YXD

31,741
15
75
115

Bi Rico · Accepted Answer · 2013-11-04T01:32:11.787

2

You could do something like:

import numpy as np

def wc(weights):
    cs = np.cumsum(weights)
    idx = cs.searchsorted(np.random.random() * cs[-1], 'right')
    return np.unravel_index(idx, weights.shape)

Notice that the cumsum is the slowest part of this, so if you need to do this repeatidly for the same array I'd suggest computing the cumsum ahead of time and reusing it.

edited Nov 04 '13 at 01:32

answered Nov 04 '13 at 00:52

Bi Rico

25,283
3
52
75

Unfortunately the weights are changing so I cannot do that. However your search is going to be costly right? I think keeping indices in a separate array would make computations cheaper, right? – Cupitor Nov 04 '13 at 01:06
1

The search is quite cheep, especially compared to `cusum` or `sum`. I don't see how indices in a separate array would help here, but maybe I'm missing something. The `cumsum` is not too bad either, but for people who need to take multiple samples from the same distribution, it's an easy optimization to do the `cumsum` ahead of time. – Bi Rico Nov 04 '13 at 01:30
2

@Naji, `cs` is sorted and `searchsorted()` exploits that to do a binary search - only `O(log(len(weights)))` comparisons are needed. Very cheap. – Tim Peters Nov 04 '13 at 02:55

Efficient way of sampling from indices of a Numpy array?

2 Answers2

Linked