4

Given an array and a value k, write a function to return index of element that equals to k with the probability of k/sum(input array). Assume there is no duplicate number in input array.

For example, if the input array is 1,4,2,3. The function should have the following behavior:

return 0 with probability 1/10;

return 1 with probability 4/10;

return 2 with probability 2/10;

return 3 with probability 3/10;

Question 2: How to deal with it if there are duplicates in the array?

I was thinking binary search is good to find an element in array, however I haven't figured out how to connect it with the probability.

Edited: As suggested, this question is similar to my question. However, its solution was not what I expected. I was looking for a solution that is embedded with binary search, which potentially decreases time complexity.

A good solution about given a key, how to use binary search to find the first element bigger than key in sorted array.

Community
  • 1
  • 1
icebox
  • 95
  • 1
  • 11
  • Are there any restrictions on the size of the array and the size of numbers stored as its elements? – Dmytro Shevchenko Sep 30 '15 at 12:31
  • 2
    possible duplicate of [Select element from array with probability proportional to its value](http://stackoverflow.com/questions/16489449/select-element-from-array-with-probability-proportional-to-its-value) – Dmytro Shevchenko Sep 30 '15 at 12:33
  • Also, check http://stackoverflow.com/questions/9330394/ and http://stackoverflow.com/questions/3679694/ – Dmytro Shevchenko Sep 30 '15 at 12:34
  • 1
    There is no restrictions. The size can be huge and the number can be large. Thank you for the links. I didn't found them while I was searching. However, after I read them, their solutions were not what I expect. Sorting an array is O(nlogn), I do like Juan Lopes solution, which is O(n) and has less constant c for O(cn). – icebox Sep 30 '15 at 17:23

4 Answers4

1

Sum all the elements(denote the sum S) and then generate a random number r from 1 to S. Then iterate over all numbers ai. If ai is not less than r, return ai. Otherwise subtract ai from r. Continue on until a value is returned. If you have a single query, you will not be able to improve over this solution.

EDIT(credit to JuanLopez): However if you are about to answer multiple queries, you can use precomputation as in prefix sum and combine it with binary search to find the exact position k for which sum xi=0ai will be less than k and x is maximum. Note that after you do prefix sum precomputation you can compute sum xi=0ai in constant time.

Ivaylo Strandjev
  • 69,226
  • 18
  • 123
  • 176
  • 1
    Binary search will help here. Instead of iterating over all numbers in O(n), you can perform a binary search on the accumulated array in O(log n). – Juan Lopes Sep 30 '15 at 12:55
  • @JuanLopes good point. I have edited my answer to include the suggested improvement. – Ivaylo Strandjev Sep 30 '15 at 14:05
  • Thanks for the solution. However, subtraction will continue forever in worst case. Prefix sum probably is better especially when the array is large. – icebox Sep 30 '15 at 17:27
1

You can make an accumulated array from the input, where B[i] = A[0] + A[1] + ... + A[i]. Generate a random int x between 1 and sum(A), then binary search B for the first element not lesser than x.

Here's an example in Python (using Python's bisect module, that's essentialy a binary search).

import random, bisect, collections

def make_random(A):
    s = sum(A)
    B = list(A)
    for i in xrange(1, len(B)):
        B[i] += B[i-1]
    def fn():
        r = random.randint(1, s)
        return bisect.bisect_left(B, r)
    return fn

rnd = make_random([1,4,2,3])

c = collections.Counter()
for i in xrange(10000):
    c[rnd()]+=1

print c

The result will look like:

Counter({1: 3960, 3: 3036, 2: 1992, 0: 1012})
Juan Lopes
  • 10,143
  • 2
  • 25
  • 44
0

Given an array and a value k, write a function to return index of element that equals to k with the probability of k/sum(input array)

You can reduce your problem to uniform sampling from [1, sum]. The idea is to use the cumulative list cum_distr of your initial list and uniformly sample a number r in [1,sum] and find the highest i such r<=cum_distr[i]

import random


def get_cum_distr(distr):
    cum_distr = []
    sum = 0
    for i in range(len(distr)):
        sum += distr[i]
        cum_distr.append(sum)
    return cum_distr


def sampler(cum_distr):
    r = random.randint(1, cum_distr[-1])
    i = 0
    while r > cum_distr[i]:
        i += 1
    return i


distr = [1, 4, 2, 3]
cum_distr = get_cum_distr(distr)
#test sampler
sample_size = 100000
samples = []
count = dict()
for _ in range(sample_size):
    r = sampler(cum_distr)
    if r in count:
        count[r] += 1
    else:
        count[r] = 1
#{0: 9996, 1: 40115, 2: 19934, 3: 29955}

Note that if the searching for the index is expensive you can use binary search instead since cum_distr is non-descresing.

How to deal with it if there are duplicates in the array?

It doesn't matter.

sve
  • 4,336
  • 1
  • 19
  • 30
0

This looks like the naive sampler (and in fact it is) , but there is a subtility in the order in which the elements are examined. By putting the largest weights in front, the loop will often complete in only a couple of iterations. So, if the distribution is very skew, this method could be faster on average.

[ I used this trick to sample from the stochastic vectors used in the Markov-nodes in Wakkerbot ]

#include <stdio.h>
#include <stdlib.h>

struct samp {
    int ret;
    unsigned weight;
    } array[4] = {{ 1,4}, { 3,3}, {2,2}, { 0,1} };

unsigned sumweight = 10;

     /* this is a *terrible* way to obtain a uniform random value */
#define urand(n) (random() % (n))

int sample(void)
{
unsigned idx, val;

val = urand(sumweight);

for( idx=0; idx < 4; idx++ ) {
    if (val < array[idx].weight) return array[idx].ret;
    val -= array[idx].weight;
    }
return -1;
}

int main(void)
{
int ret;
unsigned loop;

for (loop = 0; loop < 20; loop++) {
    ret = sample();
    printf("%u: %d\n" , loop, ret);
    }
return 0;
}
wildplasser
  • 43,142
  • 8
  • 66
  • 109