Find all the possible N-length anagrams - fast alternatives

Question

I am given a sequence of letters and have to produce all the N-length anagrams of the sequence given, where N is the length of the sequence.

I am following a kinda naive approach in python, where I am taking all the permutations in order to achieve that. I have found some similar threads like this one but I would prefer a math-oriented approach in Python. So what would be a more performant alternative to permutations? Is there anything particularly wrong in my attempt below?

from itertools import permutations
def find_all_anagrams(word):

pp = permutations(word)
perm_set = set()
for i in pp:
    perm_set.add(i)
ll = [list(i) for i in perm_set]
ll.sort()
print(ll)

See https://stackoverflow.com/questions/40752319/algorithm-to-list-unique-permutations-of-string-with-duplicate-letters/40756214#40756214 — Matt Timmermans, Jul 01 '17 at 02:41

score 3 · Answer 1 · edited Jul 01 '17 at 21:57

If there are lots of repeated letters, the key will be to produce each anagram only once instead of producing all possible permutations and eliminating duplicates.

Here's one possible algorithm which only produces each anagram once:

from collections import Counter

def perm(unplaced, prefix):
  if unplaced:
    for element in unplaced:
      yield from perm(unplaced - Counter(element), prefix + element)
  else:
    yield prefix

def permutations(iterable):
  yield from perm(Counter(iterable), "")

That's actually not much different from the classic recursion to produce all permutations; the only difference is that it uses a collections.Counter (a multiset) to hold the as-yet-unplaced elements instead of just using a list.

The number of Counter objects produced in the course of the iteration is certainly excessive, and there is almost certainly a faster way of writing that; I chose this version for its simplicity and (hopefully) its clarity

Thanks a lot I will dig the documentation starting from here. — py_script, Jun 30 '17 at 20:22
@jfsebastian: thanks. I have no idea how I managed to copy-and-paste so badly. — rici, Jul 02 '17 at 03:28

klutt · Answer 2 · 2017-06-30T19:52:22.383

This is very slow for long words with many similar characters. Slow compared to theoretical maximum performance that is. For example, permutations("mississippi") will produce a much longer list than necessary. It will have a length of 39916800, but but the set has a size of 34650.

>>> len(list(permutations("mississippi")))
39916800
>>> len(set(permutations("mississippi")))
34650

So the big flaw with your method is that you generate ALL anagrams and then remove the duplicates. Use a method that only generates the unique anagrams.

EDIT:

Here is some working, but extremely ugly and possibly buggy code. I'm making it nicer as you're reading this. It does give 34650 for mississippi, so I assume there aren't any major bugs. Warning again. UGLY!

# Returns a dictionary with letter count
# get_letter_list("mississippi") returns 
# {'i':4, 'm':1, 'p': 2, 's':4}
def get_letter_list(word):
    w = sorted(word)
    c = 0
    dd = {}
    dd[w[0]]=1
    for l in range(1,len(w)):
        if w[l]==w[l-1]:
            d[c]=d[c]+1
            dd[w[l]]=dd[w[l]]+1
        else:
            c=c+1
            d.append(1)
            dd[w[l]]=1
    return dd

def sum_dict(d):
    s=0
    for x in d:
        s=s+d[x]
    return s

# Recursively create the anagrams. It takes a letter list
# from the above function as an argument.
def create_anagrams(dd):
    if sum_dict(dd)==1: # If there's only one letter left
        for l in dd:
            return l # Ugly hack, because I'm not used to dics
    a = []
    for l in dd:
        if dd[l] != 0:
            newdd=dict(dd)
            newdd[l]=newdd[l]-1
            if newdd[l]==0:
                newdd.pop(l)
            newl=create(newdd)
        for x in newl:
            a.append(str(l)+str(x))
    return a

>>> print (len(create_anagrams(get_letter_list("mississippi"))))
34650

It works like this: For every unique letter l, create all unique permutations with one less occurance of the letter l, and then append l to all these permutations.

For "mississippi", this is way faster than set(permutations(word)) and it's far from optimally written. For instance, dictionaries are quite slow and there's probably lots of things to improve in this code, but it shows that the algorithm itself is much faster than your approach.

Could you give me an example? How will I know if it is a duplicate, if I dont generate it first? — py_script, Jun 30 '17 at 18:31
You don't need to know if there are duplicates. You only need an algorithm that does not create duplicates. Working on a better answer atm. — klutt, Jun 30 '17 at 18:48
Here's a [simple "all uniq anagrams" algorithm](https://codereview.stackexchange.com/a/52048/6143) (it excludes duplicates i.e., it generates only 34650 variants for *"mississippi"*). Though the time performance for short sequences may be worse than set(itertools.permutations(..)) — jfs, Jun 30 '17 at 19:26

score 0 · Answer 3 · answered Jun 30 '17 at 17:54

0

Maybe I am missing something, but why don't you just do this:

from itertools import permutations

def find_all_anagrams(word):
    return sorted(set(permutations(word)))

answered Jun 30 '17 at 17:54

Błotosmętek

12,717
19
29

It is not that I am not getting the values I want, it is performance issue. Is there any more performant alternative? This is kinda brute force – py_script Jun 30 '17 at 18:01

Kruupös · Answer 4 · 2017-06-30T18:11:24.677

0

You could simplify to:

from itertools import permutations

def find_all_anagrams(word):
    word = set(''.join(sorted(word)))
    return list(permutations(word))

In the doc for permutations the code is detailled and it seems already optimized.

edited Jun 30 '17 at 18:11

answered Jun 30 '17 at 17:57

Kruupös

5,097
3
27
43

What do you mean? I can only see how the permutations function is implemented not example sof usage? Am I missing something? – py_script Jun 30 '17 at 18:00
1

No you are not missing anything; but IMO most of the function implemented by very used library are over optimized and aleady follow mathematical logic. Anyway, if you want a faster alternatives, you will have to benchmark them to be sure. – Kruupös Jun 30 '17 at 18:06

Marco Luzzara · Answer 5 · 2017-06-30T18:36:28.327

I don't know python but I want to try to help you: probably there are a lot of other more performant algorithm, but I've thought about this one: it's completely recursive and it should cover all the cases of a permutation. I want to start with a basic example:

permutation of ABC

Now, this algorithm works in this way: for Length times you shift right the letters, but the last letter will become the first one (you could easily do this with a queue).

Back to the example, we will have:

ABC
BCA
CAB

Now you repeat the first (and only) step with the substring built from the second letter to the last one.

Unfortunately, with this algorithm you cannot consider permutation with repetition.

Find all the possible N-length anagrams - fast alternatives

5 Answers5

Linked