4

I am trying to figure out a way to generate all permutations possible of a string that has a couple repeating characters but without generating repeated tuples.

Right now I am using itertools.permutations(). It works but I need to remove repetition and I cannot use set() to remove the repetition.

What kind of results am I expecting? Well, for example, I want to get all the combinations for DDRR, the thing with itertools.permutations() is that I would get DDRR about four times, given that itertools sees the Ds as if they were different, same with Rs.

With list(itertools.permutations('DDRR')) I get:

[('D', 'D', 'R', 'R'), ('D', 'D', 'R', 'R'), ('D', 'R', 'D', 'R'), ('D', 'R', 'R', 'D'), ('D', 'R', 'D', 'R'), ('D', 'R', 'R', 'D'), ('D', 'D', 'R', 'R'), ('D', 'D', 'R', 'R'), ('D', 'R', 'D', 'R'), ('D', 'R', 'R', 'D'), ('D', 'R', 'D', 'R'), ('D', 'R', 'R', 'D'), ('R', 'D', 'D', 'R'), ('R', 'D', 'R', 'D'), ('R', 'D', 'D', 'R'), ('R', 'D', 'R', 'D'), ('R', 'R', 'D', 'D'), ('R', 'R', 'D', 'D'), ('R', 'D', 'D', 'R'), ('R', 'D', 'R', 'D'), ('R', 'D', 'D', 'R'), ('R', 'D', 'R', 'D'), ('R', 'R', 'D', 'D'), ('R', 'R', 'D', 'D')]

The ideal result I want is:

[('D', 'R', 'R', 'D'), ('R', 'D', 'R', 'D'), ('R', 'R', 'D', 'D'), ('D', 'R', 'D', 'R'), ('D', 'D', 'R', 'R'), ('R', 'D', 'D', 'R')]
  • Why are you unable to use `set`? What is bad about it? –  Jul 23 '16 at 17:26
  • Because I get a memory error. I am using very very long strings. – Eduardo J Castillo Jul 23 '16 at 17:28
  • It's a design choice, there's some workarounds to it, see: http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original – ifma Jul 23 '16 at 17:42
  • 1
    Aren't "very very long strings" always going to lead to impractically large output? A 52-character string which contains the alphabet twice has more than 10^60 permutations. – m69's been on strike for years Jul 23 '16 at 19:08
  • Yes, that's why we don't want to generate extra combinations that will make the output considerably larger. In the case of 'DDRR', instead of getting 4! without generating repetition I'd get 2! * 2!, That's a 18 combination difference, for only 4 characters. – Eduardo J Castillo Jul 23 '16 at 19:11
  • @nneonneo The question you linked to as being the original of this duplicate has an accepted answer, but doubts were raised in the comments about its correctness; can you confirm the answer is indeed correct and also answers this question? – m69's been on strike for years Jul 25 '16 at 05:38
  • 1
    I have answered the question about correctness (raised by freude) in that question. The algorithm is indeed correct, but requires the initial input be sorted. – nneonneo Jul 25 '16 at 05:44

3 Answers3

1

If your string contains a lot of repeated characters, then you can use a combinations-based algorithm to generate your permutations.

Basically, this works by choosing a letter and finding all the places where the duplicates of that letter can go. With each of those possibilities, you find all the places where the next letter goes, and so on.

Code:

from collections import Counter
from itertools import combinations

def perms_without_reps(s):
    partitions = list(Counter(s).items())
    k = len(partitions)
    def _helper(idxset, i):
        if len(idxset) == 0:
            yield ()
            return
        for pos in combinations(idxset, partitions[i][1]):
            for res in _helper(idxset - set(pos), i+1):
                yield (pos,) + res

    n = len(s)
    for poses in _helper(set(range(n)), 0):
        out = [None] * n
        for i, pos in enumerate(poses):
            for idx in pos:
                out[idx] = partitions[i][0]
        yield out

Run it like so:

for p in perms_without_reps('DDRR'):
    print p

Two important notes:

  • This doesn't generate output sorted in any particular way. If you want sorted output, add a permutations.sort() before k =, replace _helper(idxset - set(pos), i+1) with _helper(sorted(set(idxset) - set(pos)), i+1) and replace _helper(set(range(n)), 0) with _helper(list(range(n)), 0). This will make the function somewhat slower.
  • This function works very well if you have a large, unbalanced number of repeats. For example, any permutation-based method will just take forever on the input 'A'*100 + 'B'*2 (AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABB), whereas this method will finish nearly instantly with the 5151 unique permutations.
nneonneo
  • 171,345
  • 36
  • 312
  • 383
0

This would be a standard way to create permutations of a set with duplicate elements: count the number of occurances of each element (using e.g. an associative array or dictionary), loop through the elements in the dictionary and each time append the element to the permutation, and recurse with the rest of the dictionary. It's never going to be fast for very long input arrays, though; but then nothing probably will.

(Code example in JavaScript; I don't speak Python; translating should be easy enough.)

function permute(set) {
    var alphabet = {};
    for (var s in set) {
        if (!alphabet[set[s]]) alphabet[set[s]] = 1;
        else ++alphabet[set[s]];
    }                               // alphabet = {'a':5, 'b':2, 'r':2, 'c':1, 'd':1}
    perm("", set.length);

    function perm(part, level) {
        for (var a in alphabet) {   // every letter in alphabet
            if (alphabet[a]) {      // one or more of this letter available
                --alphabet[a];      // decrement letter count
                if (level > 1) perm(part + a, level - 1);  // recurse with rest
                else document.write(part + a + "<br>");    // deepest recursion level
                ++alphabet[a];      // restore letter count
            }
        }
    }
}
permute(['a','b','r','a','c','a','d','a','b','r','a']); // 83,160 unique permutations
                                                        // instead of 39,916,800 non-
                                                        // unique ones plus filtering
0

Generally every permutation algorithm is supposed to generate n!/(n-r)! outcomes, but you can decide to implement an algorithm that checks for repetition which should be fun.

Let's see if this helps.

def __permutate(self, objectToPermute):
    in_list = []

    for i in range(self.__cur, len(objectToPermute)):
        in_list.append(self.__swap(self.__cur, i, objectToPermute))

    ''' At initial call, self.__cur = 0
       and self.__permSize would be the r in the permutation formula '''
    if self.__cur < self.__permSize - 1:
        self.__cur += 1

        for obj in in_list:
            self.__permutate(obj)

        self.__cur -= 1


    if self.__cur == self.__permSize -1:
        for obj in in_list:
            #this does the job
            if self.norepeat and obj in self.__perm_obj:
                continue
            self.__perm_obj.append(obj[:self.__permSize])

You must have noticed, I pulled this out of a class I had written long ago, it's a Permutation algorithm I like to call the Melon algorithm (don't mind that).

What this part of the code does is simply recursively permute an object, a swap function was used as it was present in the class but you can easily code that out... Now to the main gist, To avoid repetition all you have to do is make sure the attrib self.norepeat = True and every repeated object would be skipped. If you'll be needing the class in full, I'll be glad to share

I'ud be needing a feedback so as to know if you get what I'm saying

Wells
  • 81
  • 4