-1

Given a tuple list a:

a =[(23, 11), (10, 16), (13, 11),  (12, 3), (4, 15), (10, 16), (10, 16)]

We can count how many appearances of each tuple we have using Counter:

>>> from collections import Counter
>>> b = Counter(a)
>>> b
Counter({(4, 15): 1, (10, 16): 3, (12, 3): 1, (13, 11): 1, (23, 11): 1}

Now, the idea is to select 3 random tuples from the list, without repetition, such that the count determines the probability that a particular tuple is chosen.

For instance, (10, 16) is more likely to be chosen than the others - its weight is 3/7 while the other four tuples have weight 1/7.

I have tried to use np.random.choice:

a[np.random.choice(len(a), 3, p=b/len(a))]

But I'm not able to generate the tuples.

Im trying:

a =[(23, 11), (10, 16), (13, 11),  (10, 16), (10, 16), (10, 16), (10, 16)]
b = Counter(a)
c = []
print "counter list"
print b
for item in b:
    print "item from current list"
    print item
    print "prob of the item"
    print (float(b[item])/float(len(a)))

    c.append(float(b[item])/float(len(a)))

print "prob list"
print c

print (np.random.choice(np.arange(len(b)), 3, p=c, replace=False))

In this case im getting the random indexes of the array.

  • Is there any more optimized way not to have to calculate the probabilities array?

  • Also there is an issue that is that the prob array does not correspond to the Counter array.

ccamacho
  • 707
  • 8
  • 22

3 Answers3

0

If you aren't interested in the intermediate step of calculating frequencies, you could use random.shuffle (either on the list or a copy) and then slice off as many items as you need.

e.g.

import random
a =[(23, 11), (10, 16), (13, 11),  (12, 3), (4, 15), (10, 16), (10, 16)]
random.shuffle(a)
random_sample = a[0:3]
print(random_sample)

As shuffle reorders in place it will avoid the repetition issue, and statistically should give the same result (excluding differences in random number generation between np and random).

Lewis Fogden
  • 515
  • 5
  • 8
  • Actually this is not following the definition of not having repeated items this can produce i.e. [(10, 16), (10, 16), (23, 11)] – ccamacho Jan 22 '16 at 11:13
  • I've interpreted no repetition to mean each item in the original list can only sampled once. If you are sampling from the reduced list with probabilities then I think you would need to recalculate the distribution each time an item is removed? 3/7, 1/7, 1/7, 1/7, 1/7 -> 1/4, 1/4, 1/4, 1/4 if (10,16) is removed 3/7, 1/7, 1/7, 1/7, 1/7 -> 3/6, 1/6, 1/6, 1/6 if any other removed – Lewis Fogden Jan 22 '16 at 12:16
0

This will do the trick

from collections import Counter
import matplotlib.pyplot as plt
import numpy as np
import random

listOfNumbers =[(23, 11), (10, 16), (13, 11),  (10, 16), (10, 16), (10, 16), (10, 16)]
b = Counter(listOfNumbers)
c = []
pres=[]
for k,v in b.most_common():
    c.append(float(v)/float(len(listOfNumbers)))
    pres.append(k)

resultIndex = np.random.choice(np.arange(len(b)), 3, p=c, replace=False)

ass=[]
for res in resultIndex:
    ass.append(pres[res])

print ass

Now is just to see if is there any way to optimize it.

ccamacho
  • 707
  • 8
  • 22
0

You can repeat the following steps 3 times:

  1. Randomly chose a number i in range [0..n-1] where n is a current number of elements in a.
  2. Find a tuple on i-th position in the initial a list. Add tuple to resulting triplet.
  3. Remove all occurrences of tuple from a.

Pay attention to a corner case when a can be empty.

The overall time complexity will be O(n) for a list.

On the first step number i should be generated according to uniform distribution which regular random provides. The more occurrences of a particular tuple are in a the more likely it will be chosen.

Ivan Gritsenko
  • 4,166
  • 2
  • 20
  • 34