2

If my input is a list like this:

words = ['cat','act','wer','erw']

I want to make a list of lists of anagrams like this -

[['cat','act'],['wer','erw']] 

I have tried to do something like this:

[[w1 for w in words if w!=w1 and sorted(w1)==sorted(w)] for w1 in words]

but it doesn't work. The output was :

[['cat'], ['act'], ['wer'], ['erw']]

In addition, I don`t want to use any import (except string). What is the mistake?

marshall.ward
  • 6,758
  • 8
  • 35
  • 50
Eyal Dreifuss
  • 29
  • 1
  • 5

4 Answers4

3

Be aware that your original method is actually O(#words2) time and thus will not work on large datasets of perhaps more than 10000 words.


groupby one-liner:

One of the most elegant weirdest use cases I've ever seen for itertools.groupby:

>>> [list(v) for k,v in groupby(sorted(words,key=sorted),sorted)]
[['cat', 'act'], ['wer', 'erw']]

defaultdict three-liner:

Using collections.defaultdict, you can do:

anagrams = defaultdict(list)
for w in words:
    anagrams[tuple(sorted(w))].append(w)

As for If doing it your original way without any imports, you can emulate collections.defaultdict as follows:

anagrams = {}
for w in words:
    key = tuple(sorted(w))
    anagrams.setdefault(key,[]).append(w)

example:

>>> anagrams
{('e', 'r', 'w'): ['wer', 'erw'], ('a', 'c', 't'): ['cat', 'act']}

(Also written up in whi's answer.)


map-reduce:

This problem is also the poster child for map-reduce, where the reduction key you use is the sorted letters (or more efficiently, a hash). This will allow you to massively parallelize the problem.


If we assume the length of words is bounded, the groupby solution is O(#words log(#words)), while the hash solution is expected O(#words). In the unlikely event the length of words is arbitrary in length, sorting (O(length log(length)) per word) is less efficient than using an order-agnostic hash of the letters (O(length) per word). Sadly, collections.Counter is not hashable so you'd have to write your own.

Community
  • 1
  • 1
ninjagecko
  • 88,546
  • 24
  • 137
  • 145
2
words = ['cat','act','wer','erw']
dic={}
for w in words:
    k=''.join(sorted(w))
    dic.setdefault(k,[])
    dic[k].append(w)
print dic.values()

this is better in perform: O(n)

whi
  • 2,685
  • 6
  • 33
  • 40
0

You can find various solutions to anagrams of a single word at a time by googleing. It is likely that there would be a more efficient solver around than the obvious "search through all the words I know and see if they have the same letters".

Once you have one, you can put it into a function:

def anagrams(word):
    "return a list of all known anagrams of *word*"

Once you have that, generalising it to a list of words is trivial:

[anagrams(word) for word in words]
lvc
  • 34,233
  • 10
  • 73
  • 98
0

This one should do the trick in the style you prefer

[[w, w1] for w1 in words for w in words if w!=w1 and sorted(w1)==sorted(w)][::2]
xvatar
  • 3,229
  • 17
  • 20
  • This is O(len(words)**2) so if there are lots of words this will be very slow. You've nested `for w1 in words` in `for w in words`. – Nick Craig-Wood Jun 02 '12 at 08:41
  • @NickCraig-Wood you are right, this is not a fast solution, but it's "pythonic" :) – xvatar Jun 02 '12 at 08:43
  • and what [::2] means in your sulotion – Eyal Dreifuss Jun 02 '12 at 08:52
  • @EyalDreifuss [::2] means from the first element, take every second element – xvatar Jun 02 '12 at 09:06
  • @EyalDreifuss for your answer, this part `[w1 for w in words if w!=w1 and sorted(w1)==sorted(w)]` only gets each single element and makes each of them a list; even for that case, it should be `w` instead of `w1` at the beginning, otherwise you are just taking the `w1` from the outside loop – xvatar Jun 02 '12 at 09:10