Be aware that your original method is actually O(#words2) time and thus will not work on large datasets of perhaps more than 10000 words.
groupby one-liner:
One of the most elegant weirdest use cases I've ever seen for itertools.groupby
:
>>> [list(v) for k,v in groupby(sorted(words,key=sorted),sorted)]
[['cat', 'act'], ['wer', 'erw']]
defaultdict three-liner:
Using collections.defaultdict
, you can do:
anagrams = defaultdict(list)
for w in words:
anagrams[tuple(sorted(w))].append(w)
As for If doing it your original way without any imports, you can emulate collections.defaultdict
as follows:
anagrams = {}
for w in words:
key = tuple(sorted(w))
anagrams.setdefault(key,[]).append(w)
example:
>>> anagrams
{('e', 'r', 'w'): ['wer', 'erw'], ('a', 'c', 't'): ['cat', 'act']}
(Also written up in whi's answer.)
map-reduce:
This problem is also the poster child for map-reduce, where the reduction key you use is the sorted letters (or more efficiently, a hash). This will allow you to massively parallelize the problem.
If we assume the length of words is bounded, the groupby
solution is O(#words log(#words))
, while the hash solution is expected O(#words)
. In the unlikely event the length of words is arbitrary in length, sorting (O(length log(length))
per word) is less efficient than using an order-agnostic hash of the letters (O(length)
per word). Sadly, collections.Counter is not hashable so you'd have to write your own.