5

I'd like to sample from indices of a 2D Numpy array, considering that each index is weighted by the number inside of that array. The way I know it is with numpy.random.choice however that does not return the index but the number itself. Is there any efficient way of doing so?

Here is my code:

import numpy as np
A=np.arange(1,10).reshape(3,3)
A_flat=A.flatten()
d=np.random.choice(A_flat,size=10,p=A_flat/float(np.sum(A_flat)))
print d
Cupitor
  • 11,007
  • 19
  • 65
  • 91
  • See this: http://stackoverflow.com/a/10803136/553404 with minor modifications – YXD Nov 04 '13 at 00:26
  • @MrE, but that means to make an extra array to store the indices, right? – Cupitor Nov 04 '13 at 00:29
  • 1
    Yes. I'd save the output of `np.indices(A)`, flatten the result(ing tuples) as well as your array of weights, use the linked method and your result is then given by `flattened_indices_x[idx], flattened_indices[idx]`. EDIT: Actually you can probably use http://docs.scipy.org/doc/numpy/reference/generated/numpy.unravel_index.html to avoid creating the index array and get the 2d index straight from `idx` and your weights array shape. – YXD Nov 04 '13 at 00:36

2 Answers2

2

To expand on my comment: Adapting the weighted choice method presented here https://stackoverflow.com/a/10803136/553404

def weighted_choice_indices(weights):
    cs = np.cumsum(weights.flatten())/np.sum(weights)
    idx = np.sum(cs < np.random.rand())
    return np.unravel_index(idx, weights.shape)
Community
  • 1
  • 1
YXD
  • 31,741
  • 15
  • 75
  • 115
2

You could do something like:

import numpy as np

def wc(weights):
    cs = np.cumsum(weights)
    idx = cs.searchsorted(np.random.random() * cs[-1], 'right')
    return np.unravel_index(idx, weights.shape)

Notice that the cumsum is the slowest part of this, so if you need to do this repeatidly for the same array I'd suggest computing the cumsum ahead of time and reusing it.

Bi Rico
  • 25,283
  • 3
  • 52
  • 75
  • Unfortunately the weights are changing so I cannot do that. However your search is going to be costly right? I think keeping indices in a separate array would make computations cheaper, right? – Cupitor Nov 04 '13 at 01:06
  • 1
    The search is quite cheep, especially compared to `cusum` or `sum`. I don't see how indices in a separate array would help here, but maybe I'm missing something. The `cumsum` is not too bad either, but for people who need to take multiple samples from the same distribution, it's an easy optimization to do the `cumsum` ahead of time. – Bi Rico Nov 04 '13 at 01:30
  • 2
    @Naji, `cs` is sorted and `searchsorted()` exploits that to do a binary search - only `O(log(len(weights)))` comparisons are needed. Very cheap. – Tim Peters Nov 04 '13 at 02:55