Generate a list of random weighted tuples from a list

Question

Given a tuple list a:

a =[(23, 11), (10, 16), (13, 11),  (12, 3), (4, 15), (10, 16), (10, 16)]

We can count how many appearances of each tuple we have using Counter:

>>> from collections import Counter
>>> b = Counter(a)
>>> b
Counter({(4, 15): 1, (10, 16): 3, (12, 3): 1, (13, 11): 1, (23, 11): 1}

Now, the idea is to select 3 random tuples from the list, without repetition, such that the count determines the probability that a particular tuple is chosen.

For instance, (10, 16) is more likely to be chosen than the others - its weight is 3/7 while the other four tuples have weight 1/7.

I have tried to use np.random.choice:

a[np.random.choice(len(a), 3, p=b/len(a))]

But I'm not able to generate the tuples.

Im trying:

a =[(23, 11), (10, 16), (13, 11),  (10, 16), (10, 16), (10, 16), (10, 16)]
b = Counter(a)
c = []
print "counter list"
print b
for item in b:
    print "item from current list"
    print item
    print "prob of the item"
    print (float(b[item])/float(len(a)))

    c.append(float(b[item])/float(len(a)))

print "prob list"
print c

print (np.random.choice(np.arange(len(b)), 3, p=c, replace=False))

In this case im getting the random indexes of the array.

Is there any more optimized way not to have to calculate the probabilities array?
Also there is an issue that is that the prob array does not correspond to the Counter array.

Link to related question: http://stackoverflow.com/q/34904122/3923281 — Alex Riley, Jan 21 '16 at 22:51
@Charlie - I edited some details into your question based on your previous question (I think it should be clearer to those have not read that other question now.) — Alex Riley, Jan 21 '16 at 22:53
@ajcr I found a way to do it but the counter array when iterating is returning the values in an odd order.. — ccamacho, Jan 22 '16 at 09:51

score 0 · Answer 1 · answered Jan 22 '16 at 10:08

0

If you aren't interested in the intermediate step of calculating frequencies, you could use random.shuffle (either on the list or a copy) and then slice off as many items as you need.

e.g.

import random
a =[(23, 11), (10, 16), (13, 11),  (12, 3), (4, 15), (10, 16), (10, 16)]
random.shuffle(a)
random_sample = a[0:3]
print(random_sample)

As shuffle reorders in place it will avoid the repetition issue, and statistically should give the same result (excluding differences in random number generation between np and random).

answered Jan 22 '16 at 10:08

Lewis Fogden

515
5
8

Actually this is not following the definition of not having repeated items this can produce i.e. [(10, 16), (10, 16), (23, 11)] – ccamacho Jan 22 '16 at 11:13
I've interpreted no repetition to mean each item in the original list can only sampled once. If you are sampling from the reduced list with probabilities then I think you would need to recalculate the distribution each time an item is removed? 3/7, 1/7, 1/7, 1/7, 1/7 -> 1/4, 1/4, 1/4, 1/4 if (10,16) is removed 3/7, 1/7, 1/7, 1/7, 1/7 -> 3/6, 1/6, 1/6, 1/6 if any other removed – Lewis Fogden Jan 22 '16 at 12:16

score 0 · Accepted Answer · answered Jan 22 '16 at 15:56

This will do the trick

from collections import Counter
import matplotlib.pyplot as plt
import numpy as np
import random

listOfNumbers =[(23, 11), (10, 16), (13, 11),  (10, 16), (10, 16), (10, 16), (10, 16)]
b = Counter(listOfNumbers)
c = []
pres=[]
for k,v in b.most_common():
    c.append(float(v)/float(len(listOfNumbers)))
    pres.append(k)

resultIndex = np.random.choice(np.arange(len(b)), 3, p=c, replace=False)

ass=[]
for res in resultIndex:
    ass.append(pres[res])

print ass

Now is just to see if is there any way to optimize it.

score 0 · Answer 3 · answered Jan 23 '16 at 23:48

You can repeat the following steps 3 times:

Randomly chose a number i in range [0..n-1] where n is a current number of elements in a.
Find a tuple on i-th position in the initial a list. Add tuple to resulting triplet.
Remove all occurrences of tuple from a.

Pay attention to a corner case when a can be empty.

The overall time complexity will be O(n) for a list.

On the first step number i should be generated according to uniform distribution which regular random provides. The more occurrences of a particular tuple are in a the more likely it will be chosen.

Generate a list of random weighted tuples from a list

3 Answers3

Linked