0

Suppose we have a set x of N values {x_i; i=1,...,N} and a set of some associated probabilities {w_i; i=1,...,N}.

We want to get from the set x, a new set x^ of N values {x^_i; i=1,...,N} by choosing each value x_i from the setx according to the probability w_i. How do we code that (i.e. a pseudo code algorithm, that can be translated to any language).

EDIT: python code:

def resample(self,x,w):
    N = len(w)
    new_x = empty(N)
    c = cumsum(w)

    for i in range(N):
        r = random()
        for j in range(N):
            if( j == N-1 ):
                new_x[i] = x[j]
                break
            else:
                if( (c[j] <= r) and (r < c[j+1]) ):
                    new_x[i] = x[j+1]
                    break

    new_w = ones(N,dtype=float)/N
    return new_x, new_w
shn
  • 5,116
  • 9
  • 34
  • 62
  • what's the relationship between `x_i` and `w_i`? What do you mean "according to the probability `w_i`"? – Hans Z Jun 12 '12 at 13:50
  • related: http://stackoverflow.com/questions/352670/weighted-random-selection-with-and-without-replacement – Nate Kohl Jun 12 '12 at 13:53
  • @user995434 Regardless of whether or not it's homework, questions on SO are meant to show effort on the part of the asker. SO is not a place to get people to do it all for you, whether 'it' is homework or not. All you have said is 'I need this'. – Gareth Latty Jun 12 '12 at 14:16
  • @NateKohl I've added a python code, can you check if it is correct ? – shn Jun 12 '12 at 15:03

3 Answers3

4

You can call a function which gives you a random number between 0 and 1.
If the probabilities are w_1 = 0.2, w_2 = 0.5, w_3 = 0.3, you can:
Choose x_1 if you got a number between 0 and 0.2
Choose x_2 if you got a number between 0.2 and 0.7
Choose x_3 otherwise.

More generally, choose x_n if w_1 + ... + w_(n-1) <= random number < w_1 + ... + w_(n-1) + w_n

That's not the whole pseudocode, just an explanation of its most problematic part, but I think it should be enough if you have a basic understanding of your problem.

iCanLearn
  • 336
  • 1
  • 4
  • 17
  • I've added a python code, can you check if it is correct according to what you explained ? – shn Jun 12 '12 at 15:01
1

I think the best option is preprocessing the probability set and then getting a random value.

Let me explain what I mean:

First you create a new set, for example h_i in which you place the accumulated probability of each object.

x_i:{A,B,C,D}
w_i:{0.2,0.3,0.4,0.1}
h_i:{0.2,0.5,0.9,1}

The last element is of course 1. (but if it is not (you have missing cases) it still works.

Now you generate a random number 0≤r≤1 and lookup the first element whose h is greater than r.

For example if you get 0.56 you choose C because 0.9(h_C) > 0.56 and 0.5(h_B) ≤ 0.56

This operation can be expensive on arrays but if you choose a binary search tree for the storage of the set h_i you can get very good results.

That is if you want to choose lots of random values over the same probability set. If the set is constantly changing I would use another approach.

Erpheus
  • 826
  • 7
  • 17
  • I've added a python code, can you check if it is correct according to what you explained ? – shn Jun 12 '12 at 15:02
0
# import the random library
import random

# define the input data
X = ["A","B","C","D"]
w = [0.2,0.3,0.4,0.1]

# size of the new sample
n = 10

# empy list to store the result
Xp = []

# the actual code
while len(Xp) < n:
    random_choice = random.choice(w)
    if random_choice >= random.random():
        Xp.append(X[w.index(random_choice)])

# have a look
Xp

Out[39]:

['C', 'C', 'C', 'B', 'D', 'B', 'A', 'D', 'A', 'B']

omun
  • 113
  • 1
  • 7