Source : Microsoft Interview Question
We are given a File containing words.We need to determine all the Anagrams Present in it .
Can someone suggest most optimal algorithm to do this.
Only way i know is Sorting all the words,then checking .
Source : Microsoft Interview Question
We are given a File containing words.We need to determine all the Anagrams Present in it .
Can someone suggest most optimal algorithm to do this.
Only way i know is Sorting all the words,then checking .
It would be good to know more about data before suggesting an algorithm, but lets just assume that the words are in English in the single case.
Lets assign each letter a prime number from 2 to 101. For each word we can count it's "anagram number" by multiplying its letter corresponding numbers.
Lets declare a dictionary of {number, list} pairs. And one list to collect resulting anagrams into.
Then we can collect anagrams in two steps: simply traverse through the file and put each word to a dictionary's list according to its "anagram number"; traverce the map and for every pairs list with length more then 1 store it's contents in a single big anagram list.
UPDATE:
import operator
words = ["thore", "ganamar", "notanagram", "anagram", "other"]
letter_code = {'a':2, 'b':3, 'c':5, 'd':7, 'e':11, 'f':13, 'g':17, 'h':19, 'i':23, 'j':29, 'k':31, 'l':37, 'm':41, 'n':43,
'o':47, 'p':53, 'q':59, 'r':61, 's':67, 't':71, 'u':73, 'v':79, 'w':83, 'x':89, 'y':97, 'z':101}
def evaluate(word):
return reduce( operator.mul, [letter_code[letter] for letter in word] )
anagram_map = {}
anagram_list = []
for word in words:
anagram_number = evaluate(word)
if anagram_number in anagram_map:
anagram_map[ anagram_number ] += [word]
else:
anagram_map[ anagram_number ] = [word]
if len(anagram_map[ anagram_number ]) == 2:
anagram_list += anagram_map[ anagram_number ]
elif len(anagram_map[ anagram_number ]) > 2:
anagram_list += [ word ]
print anagram_list
Of course the implementation can be optimized further. For instance, you don't really need a map of anagrams, just a counters would do fine. But I guess the code illustrates the idea best as it is.
You can use "Tries".A trie (derived from retrieval) is a multi way search tree. Tries use pattern matching algorithms. It's basic use is to create spell check programs, but I think it can help your case.. Have a look at this link http://ww0.java4.datastructures.net/handouts/Tries.pdf
I just did this one not to long ago, in a different way.
public static void allAnagrams2(String s) { String[] input = s.toLowerCase().replaceAll("[^a-z^\s]", "").split("\s"); HashMap> hm = new HashMap>();
for (int i = 0; i < input.length; i++) {
String current = input[i];
char[] chars = current.toCharArray();
Arrays.sort(chars);
String key = new String(chars);
LinkedList<String> ll = hm.containsKey(key) ? hm.get(key) : new LinkedList<String>();
ll.add(current);
if (!hm.containsKey(key))
hm.put(key, ll);
}
}
Slightly different approach from the one above. Returning a Hashmap of anagrams instead.
Public static Hashmap<String> anagrams(String [] list){
Hashmap<String, String> hm = new Hashmap<String, String>();
Hashmap<String> anagrams = new Hashmap<String>();
for (int i=0;i<list.length;i++){
char[] chars = list[i].toCharArray();
Arrays.sort(chars);
String k = chars.toString();
if(hm.containsKey(k)){
anagrams.put(k);
anagrams.put(hm.get(k));
}else{
hm.put(k, list[i]);
}
}
}