3

I have created a script that uses the following code to iterate over all combinations of characters in the sCharacters string:

sCharacters = "abcdefghijklmnopqrstuvwxyz0123456789"
iKeyLength = len(sCharacters)

for sCharacterCombination in itertools.product(sCharacters, repeat=iKeyLength):
    # Do Stuff Here

However I am only interested in combinations where no single character is represented more than n number of times in the sCharacterCombination. Eg; I want to filter out strings like aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab and only get ones like 7efad3ca411bf57f1df57c0b4e9282aa

I tried just checking each sCharacterCombination, but this is no good as I still have to iterate over a mountain of items I will never use.

How can I get the iterator to create the list based on each item having no single character represented more than n number of times in the first place, so I don't have to iterate over items I will not use?

It would be awesome if I could say:

  • Max number of times a single character could be represented in the sCharacterCombination.
  • Max times a single character could be represented in a row.

That would allow me to say a single character could appear in the sCharacterCombination a max of four times, but it could not be in a row more than twice. E.g. This is ok 1121... but this is not 1112....

Thank you for your time.

user2109254
  • 1,709
  • 2
  • 30
  • 49

1 Answers1

2

Here's some code that's more efficient than your current approach.

First, we use itertool's combinations_with_replacement function to create combinations of the desired length, filtering out combinations that have more than the desired number of repetitions. Then we permute each combination; the permutation algorithm I use (created by the 14th century Indian mathematician Narayana Pandita) handles repetitions properly, unlike the one in itertools. Then we use itertool's groupby to filter out permutations that contain runs of identical characters that are greater than the permitted run length.

I've included two functions: permutations_with_limited_repetition limits the length of runs of identical characters; permutations_with_repetition does not.

Note that the input sequence must be sorted from low to high or this algorithm will not function correctly.

from itertools import combinations_with_replacement, groupby

def lexico_permute(a):
    a = list(a)
    yield a
    n = len(a) - 1
    while True:
        for j in range(n-1, -1, -1):
            if a[j] < a[j + 1]:
                break
        else:
            return

        v = a[j]
        for k in range(n, j, -1):
            if v < a[k]:
                break

        a[j], a[k] = a[k], a[j]
        a[j+1:] = a[j+1:][::-1]
        yield a

def permutations_with_repetition(seq, length, maxrepeat): 
    for combo in combinations_with_replacement(seq, length):
        if any(combo.count(c) > maxrepeat for c in combo):
            continue
        yield from lexico_permute(combo)

def permutations_with_limited_repetition(seq, length, maxrepeat, maxrun): 
    for combo in combinations_with_replacement(seq, length):
        if any(combo.count(c) > maxrepeat for c in combo):
            continue
        for perm in lexico_permute(combo):
            if any(len(list(g)) > maxrun for _, g in groupby(perm)):
                continue
            yield perm

# Test

src = list('abcde')
for lst in permutations_with_limited_repetition(src, 3, 2, 1):
    print(''.join(lst))

output

aba
aca
ada
aea
bab
abc
acb
bac
bca
cab
cba
abd
adb
bad
bda
dab
dba
abe
aeb
bae
bea
eab
eba
cac
acd
adc
cad
cda
dac
dca
ace
aec
cae
cea
eac
eca
dad
ade
aed
dae
dea
ead
eda
eae
bcb
bdb
beb
cbc
bcd
bdc
cbd
cdb
dbc
dcb
bce
bec
cbe
ceb
ebc
ecb
dbd
bde
bed
dbe
deb
ebd
edb
ebe
cdc
cec
dcd
cde
ced
dce
dec
ecd
edc
ece
ded
ede

For a commented (non-generator) version of the permutation algorithm, please see this answer I wrote last year.


Update

The first filter

if any(combo.count(c) > maxrepeat for c in combo):

can be made more efficient by using the same groupby technique as the second filter:

if any(len(list(g)) > maxrepeat for _, g in groupby(combo)):

(I should have thought of that yesterday, but I was originally not going to do that second filter, it was a last-minute inspiration).

Community
  • 1
  • 1
PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • thanks for your great answer. I was hoping there might be another lib out there that would do it. Like at the level that itertools.product works out all the unique combinations... at that level the algo that determines every unique combination could be modified to take into account maxrepeat and maxrun. Having said that, I guess if the itertools are based on nested loops then you still have to iterate over everything at that level to exclude items based on rules.... – user2109254 Jun 12 '16 at 01:39
  • @user2109254: No worries. I've just added a small optimization. There _may_ be a significantly more efficient algorithm known to the combinatorics gurus, but if such a thing exists I've never heard of it. :) My algorithm should be much faster than your product-based approach, especially for large source strings, since it eliminates many combinations before permuting them. – PM 2Ring Jun 12 '16 at 06:40
  • (cont) itertools is written in C, so its loops run faster than pure Python loops, which means my `any(... groupby...)` filters are quite efficient. I guess it wouldn't be too hard to hack the `combinations_with_replacement` function to incorporate the `maxrepeat` filter, but I don't think it'd gain you much speed. You'd possibly get more of a speed boost by implementing `lexico_permute` in C, though. – PM 2Ring Jun 12 '16 at 06:42
  • Thanks for your feed back @PM 2Ring, and taking the time to put together an answer. – user2109254 Jun 12 '16 at 09:59