All anagrams in a File

Question

Source : Microsoft Interview Question

We are given a File containing words.We need to determine all the Anagrams Present in it .

Can someone suggest most optimal algorithm to do this.

Only way i know is Sorting all the words,then checking .

How are they measuring "optimal"? Quickest to implement? Fastest to run? Least memory used? Most accurate in counting anagrams? — Trevor Tippins, Jun 01 '13 at 12:09
It seems that you already know best method: to sort all letters in each word alphabetically, then compare words with each other (by means of sort or hash). — Egor Skriptunoff, Jun 01 '13 at 12:14
Possible duplicate of - http://stackoverflow.com/questions/18781106/generate-same-unique-hash-code-for-all-anagrams/18785523#18785523 — Leeor, Nov 12 '13 at 00:49

akalenuk · Accepted Answer · 2013-06-01T13:19:00.213

It would be good to know more about data before suggesting an algorithm, but lets just assume that the words are in English in the single case.

Lets assign each letter a prime number from 2 to 101. For each word we can count it's "anagram number" by multiplying its letter corresponding numbers.

Lets declare a dictionary of {number, list} pairs. And one list to collect resulting anagrams into.

Then we can collect anagrams in two steps: simply traverse through the file and put each word to a dictionary's list according to its "anagram number"; traverce the map and for every pairs list with length more then 1 store it's contents in a single big anagram list.

UPDATE:

import operator

words = ["thore", "ganamar", "notanagram", "anagram", "other"]

letter_code = {'a':2, 'b':3, 'c':5, 'd':7, 'e':11, 'f':13, 'g':17, 'h':19, 'i':23, 'j':29, 'k':31, 'l':37, 'm':41, 'n':43, 
            'o':47, 'p':53, 'q':59, 'r':61, 's':67, 't':71, 'u':73, 'v':79, 'w':83, 'x':89, 'y':97, 'z':101}

def evaluate(word):
    return reduce( operator.mul, [letter_code[letter] for letter in word] )

anagram_map = {}
anagram_list = []
for word in words:
    anagram_number = evaluate(word)
    if anagram_number in anagram_map:
        anagram_map[ anagram_number ] += [word]
    else:
        anagram_map[ anagram_number ] = [word]

    if len(anagram_map[ anagram_number ]) == 2:
        anagram_list += anagram_map[ anagram_number ] 
    elif len(anagram_map[ anagram_number ]) > 2:
        anagram_list += [ word ]

print anagram_list

Of course the implementation can be optimized further. For instance, you don't really need a map of anagrams, just a counters would do fine. But I guess the code illustrates the idea best as it is.

I am not sure i follow this. Can you please post an example implementation in a UPDATE . — Spandan, Jun 01 '13 at 12:28
I follow it now.Now 1 important thing i am unsure is its correctness.Cant the product of two diffrent set of numbers be same ? — Spandan, Jun 01 '13 at 13:41
Amazing, I was thinking about a good hash function for similar problems. — Adroit, Sep 12 '14 at 08:58
Wonderful. I always mesmerise when I see magical uses of prime numbers. :D — loknath, Oct 26 '17 at 05:33
Genius!. I was always thinking for O(N) time and O(1) space solution. 1 drawback I see for this solution is overflow issues for large strings and large alphabets. — Freeze Francis, Feb 28 '21 at 07:08

score 3 · Answer 2 · answered Jun 01 '13 at 12:28

3

You can use "Tries".A trie (derived from retrieval) is a multi way search tree. Tries use pattern matching algorithms. It's basic use is to create spell check programs, but I think it can help your case.. Have a look at this link http://ww0.java4.datastructures.net/handouts/Tries.pdf

answered Jun 01 '13 at 12:28

Shashank Raghunath

149
3
15

1

Nope, it won't. The question is designed in a way to throw people off track and start thinking about Tries. Tries help for exact matches not for anagrams. – Chandranshu Jul 03 '14 at 07:10
can use tries, but it'll not make any difference from using hashmap, still needed to sort each word and fill it into the trie or hashmap. – Saurabh Singh Apr 17 '20 at 18:04

score 2 · Answer 3 · answered Nov 12 '13 at 00:19

I just did this one not to long ago, in a different way.

split the file content into an array of words
create a HashMap that maps a key string to a linked list of strings
for each word in the array, sort the letters in the word and use that as the key to a linked list of anagrams

public static void allAnagrams2(String s) { String[] input = s.toLowerCase().replaceAll("[^a-z^\s]", "").split("\s"); HashMap> hm = new HashMap>();

    for (int i = 0; i < input.length; i++) {
        String current = input[i];

        char[] chars = current.toCharArray();
        Arrays.sort(chars);
        String key = new String(chars);

        LinkedList<String> ll = hm.containsKey(key) ? hm.get(key) : new LinkedList<String>();
        ll.add(current);

        if (!hm.containsKey(key))
            hm.put(key, ll);
    }
}

Time complexity is much more compared to akalenuk's answer. Because it involved sorting. — loknath, Oct 26 '17 at 05:32

score 0 · Answer 4 · answered Jan 16 '14 at 07:14

Slightly different approach from the one above. Returning a Hashmap of anagrams instead.

Public static Hashmap<String> anagrams(String [] list){

    Hashmap<String, String> hm = new Hashmap<String, String>();
    Hashmap<String> anagrams = new Hashmap<String>();

    for (int i=0;i<list.length;i++){
        char[] chars = list[i].toCharArray();
        Arrays.sort(chars);
        String k = chars.toString();
        if(hm.containsKey(k)){
            anagrams.put(k);
            anagrams.put(hm.get(k));
        }else{
            hm.put(k, list[i]); 
        }
    }
}

All anagrams in a File

4 Answers4