3

I have lists a and b

a = [0.1, 0.3, 0.1, 0.2, 0.1, 0.1, 0.1]

b = [apple, gun, pizza, sword, pasta, chicken, elephant]

Now I want to create a new list c of 3 items

the 3 items are chosen form list b based on the probabilities in list a

the items should not repeat in list c

for example- output I am looking for

c = [gun,sword,pizza]

or

c = [apple, pizza, pasta]

note (sum of all values of list a is 1,number of items in list a and b is the same, actually i have a thousand items in both list a and b and i want to select hundred items from the list based on probability assigned to them,python3 )

roganjosh
  • 12,594
  • 4
  • 29
  • 46
gokul gupta
  • 330
  • 4
  • 13

2 Answers2

7

Use random.choices:

>>> import random
>>> print(random.choices(
...     ['apple', 'gun', 'pizza', 'sword', 'pasta', 'chicken', 'elephant'], 
...     [0.1, 0.3, 0.1, 0.2, 0.1, 0.1, 0.1],
...     k=3
... ))
['gun', 'pasta', 'sword']

Edit: To avoid replacement, you can remove the selected item from the population:

def choices_no_replacement(population, weights, k=1):
    population = list(population)
    weigths = list(weights)    
    result = []
    for n in range(k):
        pos = random.choices(
            range(len(population)), 
            weights,
            k=1
        )[0]
        result.append(population[pos])
        del population[pos], weights[pos]
    return result

Testing:

>>> print(choices_no_replacement(
...     ['apple', 'gun', 'pizza', 'sword', 'pasta', 'chicken', 'elephant'],
...     [0.1, 0.3, 0.1, 0.2, 0.1, 0.1, 0.1],
...     k=3
... ))
['gun', 'pizza', 'sword']
nosklo
  • 217,122
  • 57
  • 293
  • 297
  • 1
    The documentation suggests that this picks with replacement and I can't see a way to avoid that. – roganjosh Jul 14 '18 at 08:56
  • @roganjosh added a way to do it – nosklo Jul 14 '18 at 09:20
  • I think it's just easier to use the numpy approach :) My "can't see a way to avoid that" was more in reference to `choices` not having an argument to prevent replacement. It seems some way behind the numpy method, but it's getting there :) Maybe Python 3.8 will fix it. – roganjosh Jul 14 '18 at 09:22
0

In case you don't want to define in advance how many items you want to pick (so, you don't do something like k=3) and you just have probabilities, you can do as below. Note that your probabilities do not need to add up to 1 as shown, they can be independent of each other. Admittedly, I am not addressing your issue of preventing possible repetitions:

a = [0.2, 0.3, 0.9, 0.1]
b = ['apple', 'gun', 'pizza', 'sword'] 

selected_items = [item for p,item in zip(a,b) if random.random()<p]
print(selected_items)
>>>['apple','pizza']
NeStack
  • 1,739
  • 1
  • 20
  • 40