4

I want to find create lists of anagrams from a list of words. Should I use another loop in my code or recursion?

some_list = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']

new_list = [some_list[0]]
i = 0
while i+1 < len(some_list):
    if (''.join(sorted(some_list[0]))) == (''.join(sorted(some_list[i+1]))):
        new_list.append(some_list[i+1])
        i = i+1
    else:
        i = i+1

print(new_list)

  • My output is ['bad', 'dab', 'bda', 'dba']. But I also want more lists of other anagrams from some_list.

I want the output to be: - ['app', 'ppa'] - ['bad', 'dab', 'bda', 'dba'] - ['sad', 'das']

pk.
  • 99
  • 3
  • 12

7 Answers7

6

I recommend you write Python, not Java or whatever other language you're emulating there. Here's your core code in Python, with normal looping and without all the unnecessary stuff:

new_list = [some_list[0]]
for word in some_list[1:]:
    if sorted(some_list[0]) == sorted(word):
        new_list.append(word)

I don't see use for recursion, but yes, you could wrap an outer loop around this to find the other anagram groups.


Though this is how I'd do it, using the helpful itertools.groupby:

for _, group in groupby(sorted(some_list, key=sorted), sorted):
    group = list(group)
    if len(group) > 1:
        print(group)

That prints:

['bad', 'dab', 'bda', 'dba']
['sad', 'das']
['app', 'ppa']

Alternative solution for the changed question with sorting the groups:

groups = (list(group) for _, group in groupby(sorted(some_list, key=sorted), sorted))
print([group for group in sorted(groups) if len(group) > 1])

Output:

[['app', 'ppa'], ['bad', 'dab', 'bda', 'dba'], ['sad', 'das']]
Stefan Pochmann
  • 27,593
  • 8
  • 44
  • 107
  • When it prints, it does not print the lists in an alphabetical order, i.e; ['app', 'ppa'] ['bad', 'dab', 'bda', 'dba'] ['sad', 'das'] – pk. May 14 '15 at 21:40
  • Yes. So? Your *"I want the output to be:"* part in your question doesn't have it sorted. – Stefan Pochmann May 14 '15 at 21:43
  • In general, not a good thing. You're invalidating the valid already posted answers that way. And are you sure you have the final version now? You have 'dab' before 'bda' in the second group. – Stefan Pochmann May 14 '15 at 21:59
  • Yes. The lists only need to be printed in alphabetical order. – pk. May 14 '15 at 22:06
  • Ok, I've added a solution for that. – Stefan Pochmann May 14 '15 at 22:45
  • I am writing the output to a file, but it still is not writing the results in alphabetical order. ['abolitionism', 'mobilisation'] ['accouterment', 'accoutrement'] ['alterability', 'bilaterality'] ['amphitheater', 'amphitheatre'] ['behaviourism', 'misbehaviour'] ['conservation', 'conversation'] ['commissioned', 'decommission'] ['discreetness', 'discreteness'] ['impressively', 'permissively'] ['microcephaly', 'pyrochemical'] ['paradisaical', 'paradisiacal'] ['restrengthen', 'strengthener'] ['unimpressive', 'unpermissive'] ['inactivation', 'vaticination'] – pk. May 15 '15 at 08:07
  • @azhar You must have used some wrong code. The correct one (the two-liner on the bottom of my answer) can't produce that order. I suspect you replaced my second line with your own and forgot the `sorted(...)`. – Stefan Pochmann May 15 '15 at 13:01
3

Your problem is that you loop one time over your list ,since you need to loop based on all of the words.

But i suggest another way for this task,you can use itertools.groupby and sorted function using operator.itemgetter :

some_list = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']

from operator import itemgetter
from itertools import groupby 
s=sorted([(i,''.join(sorted(j))) for i,j in enumerate(some_list)],key=itemgetter(1))
inds= [zip(*g)[0] for _,g in groupby(s,itemgetter(1))]
print [itemgetter(*i)(some_list) for i in inds]

Result :

[('bad', 'dab', 'bda', 'dba'), 'mad', ('sad', 'das'), ('app', 'ppa'), 'pge']

All that i have done here is creating a list of sorted words with those index using sorted and enumerate :

sorted([(i,''.join(sorted(j))) for i,j in enumerate(some_list)],key=itemgetter(1))
[(0, 'abd'), (4, 'abd'), (6, 'abd'), (9, 'abd'), (3, 'adm'), (2, 'ads'), (8, 'ads'), (1, 'app'), (7, 'app'), (5, 'egp')]

then we need to grouping this pairs based on the second element and get the first element (indices) so we will have the following list of tuples :

[(0, 4, 6, 9), (3,), (2, 8), (1, 7), (5,)]

that each tuple is contain the indices of the words that those sorted representations are same.

and at last all you need is picking up the elements of the main list based on the preceding indices :

[itemgetter(*i)(some_list) for i in inds]
[('bad', 'dab', 'bda', 'dba'), 'mad', ('sad', 'das'), ('app', 'ppa'), 'pge']
Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • @StefanPochmann No its not, as you can understand from explanation the logic is simple! – Mazdak May 14 '15 at 20:36
  • I mean complicated in the sense that you're doing a lot of stuff that you don't need to. Going to indexes and back for no reason. – Stefan Pochmann May 14 '15 at 20:40
  • @StefanPochmann Yeah i dont need to sort and then grub the indices i could simply used `sorted` as the key,i see you code the part `key=sorted` is what i missed and you got my vote for that! – Mazdak May 14 '15 at 20:44
0

1) Create a function anagrams(word) that is returning a list of anagrams for a single word as your code does.
2) map the function over your list of words.

Eugene Sh.
  • 17,802
  • 8
  • 40
  • 61
0

Here is a solution:

from itertools import groupby
some_list = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']
some_list_ordered = map( lambda x : "".join( sorted( x) ), some_list )
some_lists = sorted(zip( some_list_ordered, some_list ) )
anagrams = filter( lambda x : len( x ) > 1, [ zip(*v)[1]  for k,v in groupby( some_lists, lambda x : x[0] ) ] )    

for a in anagrams:
    print a

#('bad', 'bda', 'dab', 'dba')
#('das', 'sad')
#('app', 'ppa')
dermen
  • 5,252
  • 4
  • 23
  • 34
0

The natural way to do this if you can afford the memory overhead of an additional dictionary seems to me to be:

words = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']

anagrams = {}
for word in words:
    sword = ''.join(sorted(word))
    try:
        anagrams[sword].append(word)
    except KeyError:
        anagrams[sword] = [word]

anagrams_list = [v for v in anagrams.values() if len(v) > 1]
print anagrams_list

Output:

[['app', 'ppa'], ['bad', 'dab', 'bda', 'dba'], ['sad', 'das']]

EDIT: As mentioned in the comment below you can replace the try...except block with the dict method setdefault if the syntax doesn't bother you:

words = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']

anagrams = {}
for word in words:
    sword = ''.join(sorted(word))
    anagrams.setdefault(sword, []).append(word)

anagrams_list = [v for v in anagrams.values() if len(v) > 1]
print anagrams_list
xnx
  • 24,509
  • 11
  • 70
  • 109
  • Do you know `list.setdefault`? That could replace your whole try/except part. – Stefan Pochmann May 14 '15 at 20:45
  • @StefanPochmann I find the syntax for this a bit upsetting because even though it's called `setdefault` you're actually _getting_ something, but I've given an alternative. – xnx May 14 '15 at 21:00
  • Yeah, originally I also found it a bit confusing, but I got used to it :-). Though I still mostly use defaultdict instead, unless I have a good reason not to. – Stefan Pochmann May 14 '15 at 21:08
0

You can group the words in a dict, using the sorted word as the key, filtering out the words that values that don't have at least two elements, using an OrderedDict to keep order:

some_list = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']


from collections import OrderedDict

od = OrderedDict()
for ele in some_list:
    srt = "".join(sorted(ele))
    od.setdefault(srt,[]).append(ele)

print(filter(lambda x: len(x) > 1, od.values()))


[['bad', 'dab', 'bda', 'dba'], ['app', 'ppa'], ['sad', 'das']]

Or using a loop and appending to a list, using a temp list to gather common words:

new_list = []
from collections import OrderedDict
for ele in OrderedDict.fromkeys("".join(sorted(ele)) for ele in some_list):
    temp = []
    for s in some_list:
        if ele == ''.join(sorted(s)):
            temp.append(s)
    if len(temp) > 1:
        new_list.append(temp)

If order does not matter a defaultdict would be more efficient:

from collections import defaultdict

d = defaultdict(list)
for ele in some_list:
    d[''.join(sorted(ele))].append(ele)

print(filter(lambda x: len(x) > 1, d.values()))
[['app', 'ppa'], ['bad', 'dab', 'bda', 'dba'], ['sad', 'das']]
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
-1
import java.util.*;

public class GroupAnagramsTogether {
    public static void main(String[] args)
 {
        String [] input = new String [] {"bad", "app", "sad", "mad", "dab","pge", "bda", "ppa", "das", "dba"};
        System.out.println("Input: " + Arrays.toString(input));

        List<List<String>> result = groupAnagram(input);
        System.out.println(result);
    }

    private static List<List<String>> groupAnagram(String[] input) 
{
        List<List<String>> list = new ArrayList<List<String>>();
        
        HashMap<String, List<String>> mp = new HashMap<String, List<String>>();
        
        for(String s : input)
        {
            char[] ch = s.toCharArray();
            Arrays.sort(ch);
            
            String key = new String(ch);
            if(mp.containsKey(key))
            {
                mp.get(key).add(s);
            }
            else
            {
                List<String> strList = new ArrayList<String>();
                strList.add(s);
                mp.put(key,strList);
            }
        }
        list.addAll(mp.values());
        return list;
    }

}
Leo
  • 1
  • 2