How to return index of an element with probability of the element's value divided by sum of array

Question

Given an array and a value k, write a function to return index of element that equals to k with the probability of k/sum(input array). Assume there is no duplicate number in input array.

For example, if the input array is 1,4,2,3. The function should have the following behavior:

return 0 with probability 1/10;

return 1 with probability 4/10;

return 2 with probability 2/10;

return 3 with probability 3/10;

Question 2: How to deal with it if there are duplicates in the array?

I was thinking binary search is good to find an element in array, however I haven't figured out how to connect it with the probability.

Edited: As suggested, this question is similar to my question. However, its solution was not what I expected. I was looking for a solution that is embedded with binary search, which potentially decreases time complexity.

A good solution about given a key, how to use binary search to find the first element bigger than key in sorted array.

Are there any restrictions on the size of the array and the size of numbers stored as its elements? — Dmytro Shevchenko, Sep 30 '15 at 12:31
possible duplicate of [Select element from array with probability proportional to its value](http://stackoverflow.com/questions/16489449/select-element-from-array-with-probability-proportional-to-its-value) — Dmytro Shevchenko, Sep 30 '15 at 12:33
Also, check http://stackoverflow.com/questions/9330394/ and http://stackoverflow.com/questions/3679694/ — Dmytro Shevchenko, Sep 30 '15 at 12:34
There is no restrictions. The size can be huge and the number can be large. Thank you for the links. I didn't found them while I was searching. However, after I read them, their solutions were not what I expect. Sorting an array is O(nlogn), I do like Juan Lopes solution, which is O(n) and has less constant c for O(cn). — icebox, Sep 30 '15 at 17:23

Ivaylo Strandjev · Answer 1 · 2015-09-30T14:02:17.547

1

Sum all the elements(denote the sum S) and then generate a random number r from 1 to S. Then iterate over all numbers a_i. If a_i is not less than r, return a_i. Otherwise subtract a_i from r. Continue on until a value is returned. If you have a single query, you will not be able to improve over this solution.

EDIT(credit to JuanLopez): However if you are about to answer multiple queries, you can use precomputation as in prefix sum and combine it with binary search to find the exact position k for which sum^x_i=0a_i will be less than k and x is maximum. Note that after you do prefix sum precomputation you can compute sum^x_i=0a_i in constant time.

edited Sep 30 '15 at 14:02

answered Sep 30 '15 at 12:30

Ivaylo Strandjev

69,226
18
123
176

1

Binary search will help here. Instead of iterating over all numbers in O(n), you can perform a binary search on the accumulated array in O(log n). – Juan Lopes Sep 30 '15 at 12:55
@JuanLopes good point. I have edited my answer to include the suggested improvement. – Ivaylo Strandjev Sep 30 '15 at 14:05
Thanks for the solution. However, subtraction will continue forever in worst case. Prefix sum probably is better especially when the array is large. – icebox Sep 30 '15 at 17:27

Juan Lopes · Accepted Answer · 2015-09-30T13:12:34.983

You can make an accumulated array from the input, where B[i] = A[0] + A[1] + ... + A[i]. Generate a random int x between 1 and sum(A), then binary search B for the first element not lesser than x.

Here's an example in Python (using Python's bisect module, that's essentialy a binary search).

import random, bisect, collections

def make_random(A):
    s = sum(A)
    B = list(A)
    for i in xrange(1, len(B)):
        B[i] += B[i-1]
    def fn():
        r = random.randint(1, s)
        return bisect.bisect_left(B, r)
    return fn

rnd = make_random([1,4,2,3])

c = collections.Counter()
for i in xrange(10000):
    c[rnd()]+=1

print c

The result will look like:

Counter({1: 3960, 3: 3036, 2: 1992, 0: 1012})

sve · Answer 3 · 2015-09-30T12:58:32.400

Given an array and a value k, write a function to return index of element that equals to k with the probability of k/sum(input array)

You can reduce your problem to uniform sampling from [1, sum]. The idea is to use the cumulative list cum_distr of your initial list and uniformly sample a number r in [1,sum] and find the highest i such r<=cum_distr[i]

import random


def get_cum_distr(distr):
    cum_distr = []
    sum = 0
    for i in range(len(distr)):
        sum += distr[i]
        cum_distr.append(sum)
    return cum_distr


def sampler(cum_distr):
    r = random.randint(1, cum_distr[-1])
    i = 0
    while r > cum_distr[i]:
        i += 1
    return i


distr = [1, 4, 2, 3]
cum_distr = get_cum_distr(distr)
#test sampler
sample_size = 100000
samples = []
count = dict()
for _ in range(sample_size):
    r = sampler(cum_distr)
    if r in count:
        count[r] += 1
    else:
        count[r] = 1
#{0: 9996, 1: 40115, 2: 19934, 3: 29955}

Note that if the searching for the index is expensive you can use binary search instead since cum_distr is non-descresing.

How to deal with it if there are duplicates in the array?

It doesn't matter.

score 0 · Answer 4 · answered Sep 30 '15 at 18:22

This looks like the naive sampler (and in fact it is) , but there is a subtility in the order in which the elements are examined. By putting the largest weights in front, the loop will often complete in only a couple of iterations. So, if the distribution is very skew, this method could be faster on average.

[ I used this trick to sample from the stochastic vectors used in the Markov-nodes in Wakkerbot ]

#include <stdio.h>
#include <stdlib.h>

struct samp {
    int ret;
    unsigned weight;
    } array[4] = {{ 1,4}, { 3,3}, {2,2}, { 0,1} };

unsigned sumweight = 10;

     /* this is a *terrible* way to obtain a uniform random value */
#define urand(n) (random() % (n))

int sample(void)
{
unsigned idx, val;

val = urand(sumweight);

for( idx=0; idx < 4; idx++ ) {
    if (val < array[idx].weight) return array[idx].ret;
    val -= array[idx].weight;
    }
return -1;
}

int main(void)
{
int ret;
unsigned loop;

for (loop = 0; loop < 20; loop++) {
    ret = sample();
    printf("%u: %d\n" , loop, ret);
    }
return 0;
}

How to return index of an element with probability of the element's value divided by sum of array

4 Answers4