3

I have a list with 200 elements. I want to randomly calculate 10% of all the combinations of length k of those elements and store the result in a list.

For example:

Assume 'ABCD' as in ['A', 'B', 'C', 'D'] and I want combinations of length 2. All possible combinations in this case would be 6 (n! / ((n-k)! x k!)). I want to get 10% of that which is 0.6 -> 1 (round up).

I tried itertools.combinations('ABCD', 2) but it gives me all combinations.


Here's some more information about my problem.

I have

all_points_coordinates =  [
    [-1.6339171050450814, 2.5160117038362722], 
    [-1.7207293090531386, 2.4574561328669748], 
    [0.10469849010750323, 2.9981724810572872],
]

and I want to calculate combinations of 3 of them and use

def all_way(points):
    point = len(points)
    allrout = []
    allrout = list(itertools.permutations(points, point))
    return allrout

but it gives me all combinations of my points. When I run it for 100 points it is very time consuming so I want to calculate just a limited number of these combinations.

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • Why not stop iterating after 10% of combinations have been found? I don't clearly see the problem here...can you elaborate further? – Tim Pietzcker Mar 10 '17 at 12:06
  • 2
    The first 10%? Your question is not entirely clear. – Willem Van Onsem Mar 10 '17 at 12:07
  • I mean for all combination of abcd in 4 is abcd abdc acbd acdb adbc adcb bacd badc bdac bdca bcad bcda cabd cadb cbad cbda cdab cdba dabc dacb dbac dbca dcab dcba but ı dont want do callculste all ı just want to 3 of them which badc bdac bdca and order not importem just random – Ozan Tunahan Ismailoglu Mar 10 '17 at 12:09
  • I mean calculating 2000 points all combination time consuming ı just need limitied number ( 3,4) in all combination – Ozan Tunahan Ismailoglu Mar 10 '17 at 12:13
  • 1
    `itertools.combinations` returns an iterator, it doesn't create all the combinations all at once. So just loop over the combinations and break out of the loop when you have enough combinations. They will be in order, not random. Does that matter? – PM 2Ring Mar 10 '17 at 12:15
  • `create_combination(my_list[:int(len(my_list)*percent)])` first %10 is equal to first %10 elements combinations. So your question is fuzzy which pattern iterate a list for limited combinations ? – dsgdfg Mar 10 '17 at 12:17
  • `itertools.combinations('ABCD', 2)[:(len('ABCD')*percent)]` – dsgdfg Mar 10 '17 at 12:23
  • The keyword to search is "random combination". [These](https://stackoverflow.com/questions/2394246/algorithm-to-select-a-single-random-combination-of-values) [pages](https://stackoverflow.com/questions/3722430/most-efficient-way-of-randomly-choosing-a-set-of-distinct-integers) [may](https://math.stackexchange.com/questions/178690/whats-the-proof-of-correctness-for-robert-floyds-algorithm-for-selecting-a-sin) help. – Cong Ma Mar 10 '17 at 12:27
  • You might be interested in [`random.choice`](https://docs.python.org/3/library/random.html#random.choice) – Tobias Kienzler Mar 10 '17 at 12:29
  • 1
    @OzanTunahanIsmailoglu Could it be the case that you have duplicates in your list? – Ma0 Mar 10 '17 at 12:35

5 Answers5

4

We can generate the random combinations using random.sample, and use a set to ensure we don't generate any combination more than once. Here's a simple demo.

from random import seed, sample

seed(42)

def random_combinations(seq, size, num):
    combos = set()
    while len(combos) < num:
        item = sample(seq, size)
        combos.add(tuple(item))
    return list(combos)

# test

data = [
    (0, 1), (2, 3), (4, 5), (6, 7), (8, 9), 
    (10, 11), (12, 13), (14, 15), (16, 17), (18, 19),
]

# Make 20 random 3-element combinations
combos = random_combinations(data, 3, 20)
for i, item in enumerate(combos, 1):
    print('{:>2}: {}'.format(i, item))

output

 1: ((2, 3), (12, 13), (8, 9))
 2: ((6, 7), (18, 19), (4, 5))
 3: ((2, 3), (16, 17), (18, 19))
 4: ((0, 1), (4, 5), (12, 13))
 5: ((14, 15), (10, 11), (4, 5))
 6: ((2, 3), (0, 1), (8, 9))
 7: ((6, 7), (16, 17), (0, 1))
 8: ((12, 13), (2, 3), (8, 9))
 9: ((6, 7), (14, 15), (8, 9))
10: ((10, 11), (18, 19), (8, 9))
11: ((0, 1), (14, 15), (2, 3))
12: ((18, 19), (10, 11), (6, 7))
13: ((18, 19), (12, 13), (0, 1))
14: ((10, 11), (8, 9), (4, 5))
15: ((8, 9), (2, 3), (6, 7))
16: ((2, 3), (0, 1), (6, 7))
17: ((16, 17), (6, 7), (12, 13))
18: ((2, 3), (12, 13), (18, 19))
19: ((0, 1), (2, 3), (6, 7))
20: ((6, 7), (10, 11), (2, 3))

As tobias_k mentions in the comments, this code is only suitable when num is not too close to the total number of combinations. If you want < 50% of the total number of combinations it should be fine, but beyond that it will have a high chance of re-generating combinations that it's already generated, which will cause it to loop for a long time.


Note that this code considers ((2, 3), (12, 13), (8, 9)) to be distinct from a tuple containing those 3 pairs in a different order, eg ((2, 3), (8, 9), (12, 13)).

If you don't want that we can make our items into sets. We need to use frozenset for this, since normal sets are mutable and therefore unhashable and hence cannot be set items.

from random import seed, sample

seed(42)

def random_combinations(seq, size, num):
    combos = set()
    while len(combos) < num:
        item = sample(seq, size)
        combos.add(frozenset(item))
    return [tuple(u) for u in combos]

# test

data = [
    (0, 1), (2, 3), (4, 5), (6, 7), (8, 9), 
    (10, 11), (12, 13), (14, 15), (16, 17), (18, 19),
]

# Make 20 random 3-element combinations
combos = random_combinations(data, 3, 20)
for i, item in enumerate(combos, 1):
    print('{:>2}: {}'.format(i, item))

output

 1: ((0, 1), (2, 3), (6, 7))
 2: ((0, 1), (2, 3), (8, 9))
 3: ((16, 17), (6, 7), (0, 1))
 4: ((12, 13), (2, 3), (18, 19))
 5: ((12, 13), (2, 3), (8, 9))
 6: ((12, 13), (18, 19), (0, 1))
 7: ((8, 9), (4, 5), (10, 11))
 8: ((16, 17), (2, 3), (18, 19))
 9: ((8, 9), (6, 7), (14, 15))
10: ((0, 1), (4, 5), (12, 13))
11: ((8, 9), (10, 11), (18, 19))
12: ((10, 11), (6, 7), (2, 3))
13: ((0, 1), (14, 15), (2, 3))
14: ((10, 11), (18, 19), (6, 7))
15: ((8, 9), (2, 3), (6, 7))
16: ((4, 5), (6, 7), (18, 19))
17: ((8, 9), (4, 5), (2, 3))
18: ((16, 17), (4, 5), (6, 7))
19: ((16, 17), (6, 7), (12, 13))
20: ((4, 5), (10, 11), (14, 15))
PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • 1
    Nice, but you might add a warning not to use this with `num` being close to the total number of combinations, as finding the last few not-yet-seen combinations might take a long time. – tobias_k Mar 10 '17 at 13:17
  • but it is work for set ı have list how can ı solve this question. ı have data=[[9, -3], [5, 8], [-6, 7]] – Ozan Tunahan Ismailoglu Mar 10 '17 at 19:06
  • It's generally better to store coordinate pairs as tuples instead of lists. See [here](http://stackoverflow.com/q/626759/4014959) for details. And my code needs the data in that form, but it's easy to do the conversion: `newdata = [tuple(u) for u in olddata]`. And of course you can do the reverse conversion like this `[list(u) for u in newdata]`. – PM 2Ring Mar 11 '17 at 10:50
2

Another rather simple possibility: Generate all the combinations, but only keep those where some random variable is < 0.1 to get (roughly) 10% of the resulting combinations.

>>> sum(1 for _ in itertools.combinations(range(100), 3)) # total count for comparison
161700
>>> res = [c for c in itertools.combinations(range(100), 3) if random.random() < 0.1]
>>> len(res)
16227

Compared with using random.sample, this has the advantage that it does not need to keep all the combinations in memory, although it will still generate all the combinations, but discard 90% of them immediately. Also, the result will be only roughly 10% of the combinations, but not exactly. For large numbers, that should not be too much of a problem, though.

tobias_k
  • 81,265
  • 12
  • 120
  • 179
  • Nice approach. Hopefully, this will suit the OP's needs. I'll post a version using `sample` shortly. – PM 2Ring Mar 10 '17 at 13:10
  • The total number of combinations can be analytically calculated in case the `sum` take a long time – Ma0 Mar 10 '17 at 13:25
  • @Ev.Kounis Sure, just show the number for reference. For actually getting the 10%, I do not need the sum. – tobias_k Mar 10 '17 at 13:36
0

If you don't want to calculate all combinations in advance before picking a small subset of them, you have two options:

  1. Ugly, but does the job

    • Shuffle the elements of your list. Add the result to a set
    • Continue until the set has the desired length.
  2. Beautiful but complicated

    • Create a list of indices < n! (where n is the number of elements in your list)
    • Calculate the combination for each index (similar to this question.
Community
  • 1
  • 1
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • I'm familiar with [permutation indices](http://stackoverflow.com/a/28525468/4014959), but how do you apply that to combinations? – PM 2Ring Mar 10 '17 at 12:28
  • Option 1 does not guarantee that there will be no duplicates though, right? – Ma0 Mar 10 '17 at 12:31
  • 2
    @Ev.Kounis: Yes, it does by using a set. The ugliness of this solution is that it becomes very inefficient when the percentage goes up because it has to throw away more and more duplicates. – Tim Pietzcker Mar 10 '17 at 12:32
0

Option 1 (not random but generates what is needed only):

Take the first 10% percent of the results returned by itertools.combinations().

import itertools
from math import factorial, ceil    

original = 'ABCD'
k = 2
percentage = 0.1
configurations = factorial(len(original)) / (factorial(len(original) - k) * factorial(k))
take = ceil(percentage * configurations)
res = []
for i, comb in enumerate(itertools.combinations(original, k), 1):
    res.append(comb)
    if i == take:
        break
print(res, len(res))

Option 2 (random but generates the full list first):

Take 10% percent of the results returned by itertools.combinations() randomly. Requires Python 3.6 because of random.choices()

# Python 3.6 you can do this
import random
import itertools
from math import factorial, ceil

original = 'ABCD'
k = 2
percentage = 0.1
configurations = factorial(len(original)) / (factorial(len(original) - k) * factorial(k))
take = ceil(percentage * configurations)
res = random.choices([x for x in itertools.combinations(original, k)], k=take)

original can also be a list.

Community
  • 1
  • 1
Ma0
  • 15,057
  • 4
  • 35
  • 65
  • But version 1 is not random, and version 2 needs to generate all the combinations first. – tobias_k Mar 10 '17 at 12:26
  • @tobias_k True. Added comments in the answer to make it clear. 200 elements are not that many but OP didn't specify how many elements the combinations will consist of so runtime cannot be estimated easily – Ma0 Mar 10 '17 at 12:29
  • It's not clear what the OP actually wants. My _guess_ is that they want a random 10% of the (roughly) 200,000 pairs of points from their 200 point list. But I Could Be Totally Wrong. ;) – PM 2Ring Mar 10 '17 at 12:34
  • BTW, why are you using `random.choices`? Shouldn't `random.sample` so the same? – tobias_k Mar 10 '17 at 13:16
  • @tobias_k The only difference is replacement.`sample` is without replacement, `choices` with. – Ma0 Mar 10 '17 at 13:19
  • @Ev.Kounis Ah, right, did not see this. But then, why `choices` and not `sample`? – tobias_k Mar 10 '17 at 13:21
0

ı solve my problem like this

point=  len(points)
p=int(point*10/100)
allrout = list(itertools.islice(itertools.permutations(points, point),p ))
print(len(allrout))
return allrout