2

I have numpy two dimension array P such that P[i, j] >= 0 and all P[i, j] sums to one. How to choose pair of indexes (i, j) with probability P[i, j] ?

EDIT: I am interested in numpy build function. Is there something for this problem? May be for one dimensional array?

bmu
  • 35,119
  • 13
  • 91
  • 108
ashim
  • 24,380
  • 29
  • 72
  • 96
  • 1
    The easy way - pick a random number q [0,1]. Pick a random point in your matrix (list), if q < P[i,j] keep. If not, try again. The hard(er) way, build a cumlatative distribution function for your discrete prob. set. – Hooked Oct 10 '12 at 00:58
  • For a general algorithm (not necessarily numpy), see [this excellent article](http://www.keithschwarz.com/darts-dice-coins/), based on [this question](http://stackoverflow.com/questions/5027757/data-structure-for-loaded-dice). It lists a number of algorithms, including one that can pick a random element in O(1), with O(nlogn) initial setup. This question can probably be closed as a duplicate, unless numpy/scipy has a builtin method that would help. – interjay Oct 10 '12 at 01:20

3 Answers3

1

Here's a simple algorithm in python that does what you are expecting.

Let's take for example a single dimension array P equal to [0.1,0.3,0.4,0.2]. The logic can be extended to any number of dimensions.

Now we set each element to the sum of all the elements that precede it: P => [0, 0.1, 0.4, 0.8, 1]

Using a random generator, we generate numbers that are between 0 and 1. Let's say x = 0.2. Using a simple binary search, we can determine that x is between the first element and the second element. We just pick the first element for this value of x.

If you look closely, the chance that 0 =< X < 0.1 is 0.1. The chance that 0.1 =< x < 0.4 is 0.3 and so on.

For the 2D array, it is better to convert it to a 1D array, even though, you should be able to implement a 2D array binary search algorithm.

Samy Arous
  • 6,794
  • 13
  • 20
1
# setup
import bisect
import numpy as np
cs = P.cumsum()

# get random value
r = np.random.uniform()
i, j = divmod(cs.searchsorted(r), P.shape[1])

O(n) initial setup, O(log n) selection.

nneonneo
  • 171,345
  • 36
  • 312
  • 383
0

Sample setup:

P = np.random.random((20, 10)) # Initializing P matrix
P = P/P.sum() # Normalizing probability matrix

P.ravel() flattens the probability matrix P.

ij = np.random.choice(range(P.size), p=P.ravel()) 

ij tells which element from P was randomly selected using weights from probability matrix P

i, j = np.unravel_index(ij, P.shape) 

i, j are the coordinates of the selected element in matrix P

kbaltakys
  • 11
  • 3