3

Note : This is not a duplicate question as the title might say

If I have a list of list , I need to get all combinations from it with replacement.

import itertools

l = [[1,2,3] ,[1,2,3],  [1,2,3]]
n = []
for i in itertools.product(*l):
    if sorted(i) not in n:
        n.append(sorted(i))
for i in n:
    print(i)

[1, 1, 1]
[1, 1, 2]
[1, 1, 3]
[1, 2, 2]
[1, 2, 3]
[1, 3, 3]
[2, 2, 2]
[2, 2, 3]
[2, 3, 3]
[3, 3, 3]

Thanks to @RoadRunner and @Idlehands.

Above code is perfect with 2 problems :

  1. For large list, itertools.product throws MemoryError. When l has 18 3-length sublists to give ~400mil combn.

  2. Order matters and thus sorted would not work for my problem. This could be confusing for some and hence explaining with below example.

    l = [[1,2,3], [1], [1,2,3]]

Here I have 2 unique groups :

Group1 : elements 0, 2 which has same value [1,2,3]

Group 2 : element 1 which has value [1]

Thus, the solutions I need is :

[1,1,1]
[1,1,2]
[1,1,3]
[2,1,2]
[2,1,3]
[3,1,3]

Thus location 1 was fixed to 1.

Hope this example helps.

joel.wilson
  • 8,243
  • 5
  • 28
  • 48

3 Answers3

5

What about grouping sequences with the same elements in different order with a collections.defaultdict, then picking the first element from each key:

from itertools import product
from collections import defaultdict

l = [[1] ,[1,2,3],  [1,2,3]]

d = defaultdict(list)
for x in product(*l):
    d[tuple(sorted(x))].append(x)

print([x[0] for x in d.values()])

Which gives:

[(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 2), (1, 2, 3), (1, 3, 3)]

Alternatively, this can also be done with keeping a set of what has been added:

from itertools import product

l = [[1] ,[1,2,3],  [1,2,3]]

seen = set()
combs = []

for x in product(*l):
    curr = tuple(sorted(x))
    if curr not in seen:
        combs.append(x)
        seen.add(curr)

print(combs)
# [(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 2), (1, 2, 3), (1, 3, 3)]

If you don't want to sort, consider using a frozenset with collections.Counter():

from collections import Counter
from itertools import product

l = [[1] ,[1,2,3],  [1,2,3]]

seen = set()
combs = []

for x in product(*l):
    curr = frozenset(Counter(x).items())

    if curr not in seen:
        seen.add(curr)
        combs.append(x)

print(combs)
# [(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 2), (1, 2, 3), (1, 3, 3)]

Note: You can also use setdefault() for the first approach, if you don't want to use a defaultdict().

RoadRunner
  • 25,803
  • 6
  • 42
  • 75
  • When my l is of length 18 with the sublist of length 3 each, total combinations are 3^18. Itertools throws MemoryError. Any thought to do this through `recursive` or 'dynamic` for loops ? Even I'm thinking – joel.wilson Jan 22 '18 at 06:33
  • I was checking through the solution and saw the order gets `sorted`. Is there a way to avoid this. I have edited my question to better explain the problem. – joel.wilson Jan 22 '18 at 07:13
  • frozenset is good approach !! Solves problem 2. How can we avoid `itertools` ? Because if you replace l with `[[1,2,3] for i in range(18)]` You will get MemoryError – joel.wilson Jan 22 '18 at 07:34
4

Edited Answer:

Based on the new information, in order to handle a plethora of combination overloading the itertools.product(), we can try to pull the list in small batches:

from itertools import product
l = [list(range(3))]*18
prods = product(*l)
uniques = set()
results = []
totals = 0

def run_batch(n=1000000):
    for i in range(n):
        try:
            result = next(prods)
        except StopIteration:
            break
        unique = tuple(sorted(result))
        if unique not in uniques:
            uniques.add(unique)
            results.append(result)
    global totals
    totals += i

run_batch()
print('Total iteration this batch: {0}'.format(totals))
print('Number of unique tuples: {0}'.format(len(uniques)))
print('Number of wanted combos: {0}'.format(len(results)))

Output:

Total iteration this batch: 999999
Number of unique tuples: 103
Number of wanted combos: 103
First 10 results:
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2)

Here we can control the batch size by calling next(prod) with the range of your choice, and continue as you see fit. The uniques are sorted tuples in a set as a reference point, and the results are in the proper order you wanted. Both size should be the same and are surprisingly small when I ran with the list of 3^18. I'm not well acquainted with memory allocation but this way the program shouldn't store all the unwanted results in memory, so you should therefore have more wiggle room. Otherwise, you can always opt to export the results to a file to make room. Obviously this sample only show the length of the list, but you can easily display/save that for your own purpose.

I can't argue this is the best approach or most optimized, but It seems to work for me. Maybe it'll work for you? This batch took approximately ~10s to run 5 times (avg ~2s each batch). The entire set of prods took me 15 minutes to run:

Total iteration: 387420102
Number of unique tuples: 190
Number of wanted combos: 190

Original Answer:

@RoadRunner had a neat solution with sort() and defaultdict, but I feel the latter was not needed. I leveraged his sort() suggestion and implemented a modified version here.

From this answer:

l = [[1] ,[1,2,3],  [1,2,3]]
n = []
for i in itertools.product(*l):
    if sorted(i) not in n:
        n.append(sorted(i))
for i in n:
    print(i)

Output:

[1, 1, 1]
[1, 1, 2]
[1, 1, 3]
[1, 2, 2]
[1, 2, 3]
[1, 3, 3]
r.ook
  • 13,466
  • 2
  • 22
  • 39
  • `(1,1,2)` is same as `(1,2,1)` ! – joel.wilson Jan 22 '18 at 06:02
  • I'm testing all the scenarios I have using all answers posted here and will accept one. Thank you very much guys ! – joel.wilson Jan 22 '18 at 06:17
  • @Idlehands Yeah realised the `defaultdict` approach is not the best way. Simply keeping track of seen items with a set is good enough. – RoadRunner Jan 22 '18 at 06:17
  • When my l is of length 18 with the sublist of length 3 each, total combinations are 3^18. Itertools throws MemoryError. Any thought to do this through recursive or 'dynamic` for loops ? Even I'm thinking – joel.wilson Jan 22 '18 at 06:33
  • Don't repeatedly use `in` on a list, especially once we know that length/performance is a concern! This will be much faster using `n = set()` and `n.add(sorted(i))` - that's just a more explicit way of doing `set(sorted(i) for i in itertools.product(*l))`, though. – Peter DeGlopper Jan 22 '18 at 06:47
  • 1
    @PeterDeGlopper `set(sorted(i) for i in itertools.product(*l))` won't work here, since lists are not hashable. I think you mean `tuple(sorted(i))` instead. – RoadRunner Jan 22 '18 at 06:56
  • I was checking through the solution and saw the order gets `sorted`. Is there a way to avoid this. I have edited my question to better explain the problem. – joel.wilson Jan 22 '18 at 07:13
  • Yeah, fair call on hashability issue, I'll update my answer. – Peter DeGlopper Jan 22 '18 at 07:18
  • @Joel.Wilson is your live case a 18-1-18 combination as well, where both of the first and end group are identical? If is it viable to do a permutation on the first and last group first and then insert the middle group after? – r.ook Jan 22 '18 at 14:12
  • @Idlehands it's not. Everything is dynamic here. – joel.wilson Jan 22 '18 at 15:09
  • @joel.wilson I'm a bit confused. Your additional condition of (1, 1, 2) = (2, 1, 1) means that you are expecting identical elements, but if everything is dynamic how do you determine what's identical? Do you sometimes pass the same `list` object into the function more than once? Reason I ask is generating 387mil output (assuming that's not even your max case) is not trivial - whatever helps to reduce the duplicates on the way would help immensely. Say if I have `a = [1,2,3]` and `b = [1,2,3]`, is `f(a,a)` expected to be less than `f(a,b)` because one is by reference and one is by value? – r.ook Jan 22 '18 at 15:57
  • @Idlehands I have to agree with you here. The only thing I can think of is tweaking the `itertools.product` code [here](https://docs.python.org/3/library/itertools.html#itertools.product) to remove duplicates before they get yielded. – RoadRunner Jan 22 '18 at 16:34
  • @RoadRunner that's similar to my approach as well but I'm having trouble narrowing down the requirements for "identical items" as they were a bit muddy. Didn't think to go back to the source on `product()` though, that's great. – r.ook Jan 22 '18 at 16:46
  • regarding your query on identical elements : think of 3 machines`(m1,m2,m3)` and each of these machines can be asigned to either of `(g1,g2,g3)` except for `m2` which can only be part of `g2` because of some characteristics : like `m1&m3` take `t1` sec to perform a job and hence are equivalent. But `m2` takes `t2` sec and hence belongs to different category. Therefore, `1st` and `3rd`position can be interchanged but not the second one. This is convincing ? – joel.wilson Jan 22 '18 at 17:41
  • @joel.wilson see my edit. I think I might have something. – r.ook Jan 23 '18 at 03:21
  • hey thanks ! Though even this leads to MemoryError for me, Its soo great to see all these approaches I am waiting for soem more time and then will be closing this question by accepting – joel.wilson Jan 23 '18 at 18:08
  • At what point did the `MemoryError` occur for you? When I ran the 387mil batch in 1mil increments it was fine, I'm surprised. With your comment I did a little bit of memory monitoring and it's only using 13% of my CPU with the 3^18 criteria. – r.ook Jan 23 '18 at 18:56
4

For short input sequences, this can be done by filtering the output of itertools.product to just the unique values. One not optimized way is set(tuple(sorted(t)) for t in itertools.product(*l)), converting to a list if you like.

If you have enough of a Cartesian product fanout that this is too inefficient, and if your input example showing the sublists as sorted is something you can rely on, you could borrow a note from the docs' discussion of permutations and filter out non-sorted values:

The code for permutations() can be also expressed as a subsequence of product(), filtered to exclude entries with repeated elements (those from the same position in the input pool)

So you'd want a quick test for whether a value is sorted or not, something like this answer: https://stackoverflow.com/a/3755410/2337736

And then list(t for t in itertools.product(*l) if is_sorted(t))

Beyond that, I think you'd have to get into recursion or a fixed length of l.

Peter DeGlopper
  • 36,326
  • 7
  • 90
  • 83
  • The last point is exactly my challenge. I ahve list of length 18, each sublist each of length 3. thus amounting to 3^18 combinations. I was wondering about recursive or dynamic for loops. – joel.wilson Jan 22 '18 at 06:35
  • So you have 18 total elements split between 6 3-length sublists, or you have 18 3-length sublists for 54 total? The former is trivial to generate and then filter, performance/memory wise - it's only 729 candidates. The second, well, it's about 387 million elements - possible but it might take some tuning. – Peter DeGlopper Jan 22 '18 at 06:49
  • I have the 387mil case one. How can we tune them ? – joel.wilson Jan 22 '18 at 06:50
  • 1
    ...hang on, that's 387 million 18-element tuples. About 7 billion numbers to work with. If they're really ints, as in this example...well, my system shows 28 bytes for an int. Even if tuples were free, and they're not, that's approximately 196 GB. No wonder `itertools.product` is blowing up. I withdraw the "possible but it might take some tuning" comment. "Possible but with a lot of work first" is probably more accurate. If I were in your shoes I'd try to rework the problem if at all possible. – Peter DeGlopper Jan 22 '18 at 07:48
  • `I'd try to rework the problem if at all possible` thats what I'm trying – joel.wilson Jan 22 '18 at 08:40