8

Source : Microsoft Interview Question

We are given a File containing words.We need to determine all the Anagrams Present in it .

Can someone suggest most optimal algorithm to do this.

Only way i know is Sorting all the words,then checking .

Spandan
  • 2,128
  • 5
  • 25
  • 37

4 Answers4

15

It would be good to know more about data before suggesting an algorithm, but lets just assume that the words are in English in the single case.

Lets assign each letter a prime number from 2 to 101. For each word we can count it's "anagram number" by multiplying its letter corresponding numbers.

Lets declare a dictionary of {number, list} pairs. And one list to collect resulting anagrams into.

Then we can collect anagrams in two steps: simply traverse through the file and put each word to a dictionary's list according to its "anagram number"; traverce the map and for every pairs list with length more then 1 store it's contents in a single big anagram list.

UPDATE:

import operator

words = ["thore", "ganamar", "notanagram", "anagram", "other"]

letter_code = {'a':2, 'b':3, 'c':5, 'd':7, 'e':11, 'f':13, 'g':17, 'h':19, 'i':23, 'j':29, 'k':31, 'l':37, 'm':41, 'n':43, 
            'o':47, 'p':53, 'q':59, 'r':61, 's':67, 't':71, 'u':73, 'v':79, 'w':83, 'x':89, 'y':97, 'z':101}

def evaluate(word):
    return reduce( operator.mul, [letter_code[letter] for letter in word] )

anagram_map = {}
anagram_list = []
for word in words:
    anagram_number = evaluate(word)
    if anagram_number in anagram_map:
        anagram_map[ anagram_number ] += [word]
    else:
        anagram_map[ anagram_number ] = [word]

    if len(anagram_map[ anagram_number ]) == 2:
        anagram_list += anagram_map[ anagram_number ] 
    elif len(anagram_map[ anagram_number ]) > 2:
        anagram_list += [ word ]

print anagram_list

Of course the implementation can be optimized further. For instance, you don't really need a map of anagrams, just a counters would do fine. But I guess the code illustrates the idea best as it is.

akalenuk
  • 3,815
  • 4
  • 34
  • 56
3

You can use "Tries".A trie (derived from retrieval) is a multi way search tree. Tries use pattern matching algorithms. It's basic use is to create spell check programs, but I think it can help your case.. Have a look at this link http://ww0.java4.datastructures.net/handouts/Tries.pdf

  • 1
    Nope, it won't. The question is designed in a way to throw people off track and start thinking about Tries. Tries help for exact matches not for anagrams. – Chandranshu Jul 03 '14 at 07:10
  • can use tries, but it'll not make any difference from using hashmap, still needed to sort each word and fill it into the trie or hashmap. – Saurabh Singh Apr 17 '20 at 18:04
2

I just did this one not to long ago, in a different way.

  1. split the file content into an array of words
  2. create a HashMap that maps a key string to a linked list of strings
  3. for each word in the array, sort the letters in the word and use that as the key to a linked list of anagrams

public static void allAnagrams2(String s) { String[] input = s.toLowerCase().replaceAll("[^a-z^\s]", "").split("\s"); HashMap> hm = new HashMap>();

    for (int i = 0; i < input.length; i++) {
        String current = input[i];

        char[] chars = current.toCharArray();
        Arrays.sort(chars);
        String key = new String(chars);

        LinkedList<String> ll = hm.containsKey(key) ? hm.get(key) : new LinkedList<String>();
        ll.add(current);

        if (!hm.containsKey(key))
            hm.put(key, ll);
    }
}
Adrian Bratu
  • 498
  • 2
  • 9
0

Slightly different approach from the one above. Returning a Hashmap of anagrams instead.

Public static Hashmap<String> anagrams(String [] list){

    Hashmap<String, String> hm = new Hashmap<String, String>();
    Hashmap<String> anagrams = new Hashmap<String>();

    for (int i=0;i<list.length;i++){
        char[] chars = list[i].toCharArray();
        Arrays.sort(chars);
        String k = chars.toString();
        if(hm.containsKey(k)){
            anagrams.put(k);
            anagrams.put(hm.get(k));
        }else{
            hm.put(k, list[i]); 
        }
    }
}
Algorithmatic
  • 1,824
  • 2
  • 24
  • 41