5

Given a positive integer array a, the goal is to generate 5 random numbers based on the weight they have in the array.

For example:

a = [2,3,4,4,4,4,4,6,7,8,9]

In this case the number 4 has appeared 5 times, in this case the number 4 should have the probability of 5/11 to appear.

No numbers should be repeated.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
ccamacho
  • 707
  • 8
  • 22
  • What do you mean without repetition? In that can it does not preserve the weights. Do you mean without repetition of the index? – Tom Ron Jan 20 '16 at 15:33
  • Your question isn't quite clear, but if you mean what I think you mean, here's a duplicate: http://stackoverflow.com/questions/10803135/weighted-choice-short-and-simple – Alex Riley Jan 20 '16 at 15:34
  • are you after `np.random.choice(list(set(a)), size=5,replace=False)`? – EdChum Jan 20 '16 at 15:35
  • By doing list(set(a)) you are removing the weights. The thing is that i don't have calculated the Weights as they have it on http://stackoverflow.com/questions/10803135/weighted-choice-short-and-simple I have the weights as the repeated numbers @ajcr you are right but i dont have the probability of each value – ccamacho Jan 20 '16 at 15:43
  • @Charlie: ah I see, so given `a`, do you want to calculate the weights and then use the random choice function? – Alex Riley Jan 20 '16 at 15:44
  • @ajcr you are right :) – ccamacho Jan 20 '16 at 16:23
  • @Charlie: are the values in your array always positive integers? – Alex Riley Jan 20 '16 at 17:02

3 Answers3

5

Given a, an array of positive integers, you'll first need to compute the frequency of each integer. For example, using bincount:

>>> a = [2,3,4,4,4,4,4,4,5,6,7,8,9,4,9,2,3,6,3,1]
>>> b = np.bincount(a)

b tells you the frequency of each integer in a. The corresponding set of weights is therefore the array b/len(a). Using np.random.choice with these weights and replace=False should then give you what you need:

>>> np.random.choice(np.arange(len(b)), 5, p=b/len(a), replace=False)
array([5, 9, 4, 3, 8])
>>> np.random.choice(np.arange(len(b)), 5, p=b/len(a), replace=False)
array([7, 4, 6, 9, 1])
>>> np.random.choice(np.arange(len(b)), 5, p=b/len(a), replace=False)
array([3, 7, 4, 9, 6])

If you're not working with only positive integers, or if you are working with large positive integers, @user2357112 points out in the comments below that np.unique provides another solution. Here you'd write:

>>> choices, counts = np.unique(a, return_counts=True)
>>> np.random.choice(choices, 5, p=counts/len(a), replace=False)
array([9, 8, 2, 4, 5])
Alex Riley
  • 169,130
  • 45
  • 262
  • 238
  • 1
    You could also use [`numpy.unique`](http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.unique.html) with the `return_counts` option, which works much better when your input isn't a bunch of nonnegative integers or `max(a)` is large. – user2357112 Jan 20 '16 at 17:29
  • Thanks - I always seem to forget about the `return_counts` parameter - that would indeed be much better in the cases you've mentioned. – Alex Riley Jan 20 '16 at 17:33
  • Is it possible to update it to this example? a =[(23, 11), (10, 16), (13, 11), (12, 3), (4, 15), (10, 16), (10, 16)] b = Counter(elem for elem in a) c = len(a) @ajcr – ccamacho Jan 21 '16 at 19:26
  • @Charlie: I'd argue that your updated question is different enough to warrant asking a completely new question (list of integers vs. Counter object with tuples as keys). It's probably best to leave *this* page about the list of integers because it's probably the most common case and likely to be of help to future readers. If you ask a new question about Counter objects, I'll gladly take a look and try to answer it as best I can. – Alex Riley Jan 21 '16 at 20:36
  • @ajcr i have updated the question, this is the other one (http://stackoverflow.com/questions/34934540/generate-a-list-of-random-weighted-tuples-from-a-list) BTW thanks!!! – ccamacho Jan 21 '16 at 21:18
0

You probably looking for numpy.random.multinomial For example np.random.multinomial(1, [1/6.]*6, size=1) is throwing fair dice once. After you get an result you can update probability vector(must sum to 1) as you desire. For example numpy.random.multinomial(1, [1/2., 1/2., 0., 0., 0., 0.], size=1).

Farseer
  • 4,036
  • 3
  • 42
  • 61
0

The simplest solution (and perhaps most inefficient) can be as follows:

import random
def get_randoms(n, a):
    a = a[:]
    r = []
    for i in range(n):
        r.append(random.choice(a))
        a = [y for y in a if y != r[-1]]
    return r

print get_randoms(5, [2,3,4,4,4,4,4,4,5,6,7,8,9,4,9,2,3,6,3,1])

The output can would be something like

[8, 2, 3, 6, 9]
Pykler
  • 14,565
  • 9
  • 41
  • 50