Finding and grouping anagrams by Python

Question

input: ['abc', 'cab', 'cafe', 'face', 'goo']
output: [['abc', 'cab'], ['cafe', 'face'], ['goo']]

The problem is simple: it groups by anagrams. The order doesn't matter.

Of course, I can do this by C++ (that's my mother tongue). But, I'm wondering this can be done in a single line by Python. EDITED: If it's not possible, maybe 2 or 3 lines. I'm a newbie in Python.

To check whether two strings are anagram, I used sorting.

>>> input = ['abc', 'cab', 'cafe', 'face', 'goo']
>>> input2 = [''.join(sorted(x)) for x in input]
>>> input2
['abc', 'abc', 'acef', 'acef', 'goo']

I think it may be doable by combining map or so. But, I need to use a dict as a hash table. I don't know yet whether this is doable in a single line. Any hints would be appreicated!

In Ruby: `xs.group_by { |x| x.chars.sort.join }.values`. I wonder why Python hasn't (or has it?) a `group_by` function somewhere in the standard library (`itertools.groupby()` only groups on consecutive elements). Anyone? — tokland, Nov 18 '11 at 12:04
Also, check discussion section: https://leetcode.com/problems/group-anagrams/, people provide many interesting solutions there. — funnydman, Jul 23 '22 at 22:01
Do you actually want to *group* the list - i.e., look for only *consecutive* elements that are anagrams? Or do you want to *categorize* the elements according to their sorted form? In other words - if the input is `['abc', 'def', 'cab']`, should `'abc'` and `'cab'` be grouped together? — Karl Knechtel, Aug 01 '22 at 21:45

David Webb · Accepted Answer · 2011-11-18T12:00:19.790

A readable one-line solution:

output = [list(group) for key,group in groupby(sorted(words,key=sorted),sorted)]

For example:

>>> words = ['abc', 'cab', 'cafe', 'goo', 'face']
>>> from itertools import groupby
>>> [list(group) for key,group in groupby(sorted(words,key=sorted),sorted)]
[['abc', 'cab'], ['cafe', 'face'], ['goo']]

The key thing here is to use itertools.groupby from the itertools module which will group items in a list together.

The list we supply to groupby has to be sorted in advanced so we pass it sorted(words,key=sorted). The trick here is that sorted can take a key function and will sort based on the output from this function, so we pass sorted again as the key function and this will will sort the words using the letters of the string in order. There's no need to define our own function or create a lambda.

groupby takes a key function which it uses to tell if items should be grouped together and again we can just pass it the built-in sorted function.

The final thing to note is that the output is pairs of key and group objects, so we just take the grouper objects and use the list function to convert each of them to a list.

(BTW - I wouldn't call your variable input as then your hiding the built-in input function, although it's probably not one you should be using.)

@wutz - you're right, it needs to handle length in the initial sort. Will have loo — David Webb, Nov 18 '11 at 11:54
@wutz - fixed now by changing `sorted(words)` to `sorted(words,key=sorted)` — David Webb, Nov 18 '11 at 12:03
@wutz -Thanks, and thanks for your helping in testing it. :-) — David Webb, Nov 18 '11 at 12:07
Thanks for the amazing explanation. I'm trying to understand how the `key` and `key func` work for `groupby()`. I found that in your example, if you don't specify the keyfunc for groupby(), the key in the results will be 'abc', 'cab'...same as the list elements. However, after using `sorted` as keyfunc, key will become `['a', 'b', 'c']`...basically spell-outs from each group objects. Would you care to explain why it would do that please? Thanks. — Bowen Liu, Jan 10 '20 at 01:15

Adrien Plisson · Answer 2 · 2011-11-18T11:50:35.307

3

the unreadable, one-line solution:

>>> import itertools
>>> input = ['abc', 'face', 'goo', 'cab', 'cafe']
>>> [list(group) for key,group in itertools.groupby(sorted(input, key=sorted), sorted)]
[['abc', 'cab'], ['cafe', 'face'], ['goo']]

(well, it is really 2 lines if you count the import...)

edited Nov 18 '11 at 11:50

answered Nov 18 '11 at 11:34

Adrien Plisson

22,486
6
42
73

This fails if the anagrams are not adjacent in the input – wutz Nov 18 '11 at 11:38

score 2 · Answer 3 · answered Jun 03 '17 at 14:43

2

Dave's answer is concise, however the sort which is required by groupby is a O(n log(n)) operation. A faster solution is this:

from collections import defaultdict

def group_anagrams(strings):
    m = defaultdict(list)

    for s in strings:
        m[tuple(sorted(s))].append(s)

    return list(m.values())

answered Jun 03 '17 at 14:43

kakarukeys

21,481
10
35
48

The `groupby` approach works with `collections.Counter` as the key instead of `sorted`. In that case, it's linear. But `sorted` is actually quite fast and I suspect that unless the words are very very long, using `sorted` is faster. – Stef Oct 15 '21 at 13:27
1

@Stef the time complexity is w.r.t to the length of the list of strings passed into `group_anagrams` not the length of the string passed into the key function. Changing the key function to `Counter` does not help Dave's answer, the time complexity is still `O(n log(n))` because of `sorted` before `groupby` call. But you are right about one thing, if n is not large (a few millions), the actual runtime performance could be about the same. – kakarukeys Oct 16 '21 at 14:44
Oh, you're right. I got confused. Dave's answer uses `sorted` on the list of strings **and** as the key called on every string, and when I read "the sort" in your answer I thought about the key first for some reason. – Stef Oct 16 '21 at 14:47

score 2 · Answer 4 · answered Nov 18 '11 at 11:29

2

not a one liner but a solution...

d = {}
for item in input:
  s = "".join(sorted(item))
  if not d.has_key(s):
    d[s] = []
  d[s].append(item)
input2 = d.values()

answered Nov 18 '11 at 11:29

yurib

8,043
3
30
55

score 2 · Answer 5 · answered Nov 18 '11 at 11:31

The readable version:

from itertools import groupby
from operator import itemgetter

def norm(w):
  return "".join(sorted(w))

words = ['abc', 'cba', 'gaff', 'ffag', 'aaaa']

words_aug = sorted((norm(word), word) for word in words)

grouped = groupby(words_aug, itemgetter(0))

for _, group in grouped:
  print map(itemgetter(1), group)

The one-liner:

print list(list(anagrams for _, anagrams in group) for _, group in groupby(sorted(("".join(sorted(word)), word) for word in words), itemgetter(0)))

Prints:

[['aaaa'], ['abc', 'cba'], ['ffag', 'gaff']]

+1, I'd prefer using `[[anagrams...` instead of `list(list(anagrams` tho to improve readability — neurino, Nov 18 '11 at 11:40

Acorn · Answer 6 · 2011-11-18T11:59:02.427

1

from itertools import groupby

words = ['oog', 'abc', 'cab', 'cafe', 'face', 'goo', 'foo']

print [list(g) for k, g in groupby(sorted(words, key=sorted), sorted)]

Result:

[['abc', 'cab'], ['cafe', 'face'], ['foo'], ['oog', 'goo']]

You can't just use the groupby function, as that only groups together sequential elements for which your key function produces the same result.

The easy solution is just to sort the words first using the same function as you use for grouping.

edited Nov 18 '11 at 11:59

answered Nov 18 '11 at 11:41

Acorn

49,061
27
133
172

Yeah, overlooked that, plus the fact that the words have to be adjacent. Fixed. – Acorn Nov 18 '11 at 11:50

score 0 · Answer 7 · answered Oct 01 '21 at 17:04

Although the comments are 100% correct if you're tryna solve without using an import and its built in functions (idk for a brain teaser), then here you go

def sort_anagrams(li):
        new_li = []
    for i in li:
        tree = False
        for j in new_li:
            if sorted(i) == sorted(j[0]):
                j.append(i)
                tree = True
        if not tree:
            new_li.append([i])
    return new_li

And in usage:

list_of = ['abc', 'face', 'goo', 'cab', 'cafe']
print(sort_anagrams(list_of))

Output:

[['abc', 'cab'], ['cafe', 'face'], ['goo']]

Finding and grouping anagrams by Python

7 Answers7

Linked

Related