2

I would like to avoid redundancy into all possible combinations in a list of string (for example 1122 is the same thing as 2211 in my context therefore only one or the other should be in the resulted list).

I would also like to apply a filter during the combination. For example, I don't want to have any string in the result that contains 3.

How should I handle this logic?

This code is doing the combination

>>> keywords = [''.join(i) for i in itertools.product(['11', '22', '33'], repeat = 2)]
>>> keywords
['1111', '1122', '1133', '2211', '2222', '2233', '3311', '3322', '3333']
Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Michael
  • 2,436
  • 1
  • 36
  • 57
  • should `2222` and `3333` also be skipped? – RomanPerekhrest Dec 14 '17 at 14:05
  • Maybe loop and sort each string, add original string if sorted string doesn't exist in new list: https://stackoverflow.com/questions/15046242/how-to-sort-the-letters-in-a-string-alphabetically-in-python I can write up a better answer if this looks like it's the right direction you want to go. – sniperd Dec 14 '17 at 14:05
  • any string that contains a 3 should be dropped – Michael Dec 14 '17 at 14:09
  • 1
    If you don't want 3 in the output you should filter such strings out of the _input_ to `product`. – PM 2Ring Dec 14 '17 at 14:09
  • Will `repeat` always be 2? – PM 2Ring Dec 14 '17 at 14:13
  • yes always two - I cannot filter from the `input` because this is a toy example, in my project, I am actually working with a pandas data frame and the filter criteria are much more complex and involve calculation between each element of the combination. – Michael Dec 14 '17 at 14:16

3 Answers3

2

Depending on your actual data there may be a more efficient way to do this, but the algorithm below will work. We eliminate the duplicates by a simple comparison. I've put the check for '3's into a function. That's slightly slower than doing it directly in the list comprehension, but it makes the code more general.

import itertools

bases = ['11', '22', '33', '44']

def is_ok(u, v):
    ''' Test if a u,v pair is permitted '''
    return not ('3' in u or '3' in v)

keywords = [u+v for u, v in itertools.product(bases, repeat = 2) if u <= v and is_ok(u, v)]

output

['1111', '1122', '1144', '2222', '2244', '4444']
print(keywords)
PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
2

The same is possible by filtering itertools.combinations_with_replacement:

Code

import itertools as it


bases = ["11", "22", "33", "44"]

[x+y for x,y in it.combinations_with_replacement(bases, 2) if "3" not in x+y]
# ['1111', '1122', '1144', '2222', '2244', '4444']

This version is more general and does not rely on comparing numeric strings.


Details

From the docs we can understand why this works:

The code for combinations_with_replacement() can be also expressed as a subsequence of product() after filtering entries where the elements are not in sorted order (according to their position in the input pool)

def combinations_with_replacement(iterable, r):
    pool = tuple(iterable)
    n = len(pool)
    for indices in product(range(n), repeat=r):
        if sorted(indices) == list(indices):
            yield tuple(pool[i] for i in indices)

In this manner, each item is associated with a unique index. When the indices of two items are compared, only sorted combinations are used to yield an item. The remaining indices are discarded as already seen.

(0, 0)
(0, 1)
(0, 2)
(0, 3)
(1, 0)                                         # discarded
(1, 1)
...

See the docs for more details on the similarity between this tool and itertools.product.

pylang
  • 40,867
  • 14
  • 129
  • 121
0

This should do what you want:

import itertools

def main():

    newkeywords = []
    keywords = ["".join(i) for i in itertools.product(["11", "22", "33"], repeat = 2)]
    for item in keywords:
        newitem = "".join(sorted(item))
        if "3" not in newitem and newitem not in newkeywords:
            newkeywords.append(newitem)
    print(newkeywords)

main()

results:

['1111', '1122', '2222']

It makes a new list, and if that sorted item (which makes 1122 the same as 2211) exists or the number "3" exists, don't add it to the new list.

sniperd
  • 5,124
  • 6
  • 28
  • 44