When you want to find all anagrams, it’s not recommended to try to compare one word with all other words, as you’ll end up comparing every word with every other word, which is known as quadratic time complexity. For processing 1,000 words, you would need one millions comparisons, for processing 100,000 words, you would need 10,000,000,000 comparisons and so on.
You may change your isAnagram
method to provide a lookup key for data structures like HashMap
:
static CharBuffer getAnagramKey(String s) {
char[] word1 = s.replaceAll("[\\s]", "").toCharArray();
Arrays.sort(word1);
return CharBuffer.wrap(word1);
}
The class CharBuffer
wraps a char[]
array and provides the necessary equals
and hashCode
methods without copying the array contents, which makes it preferable to constructing a new String
.
As a side note, .replaceAll("[\\s]", "")
could be simplified to .replaceAll("\\s", "")
, both would eliminate all space characters, but the example input of your question has no space characters at all. To remove all non-word characters like apostrophes and ampersands, you could use s.replaceAll("\\W", "")
.
Then, you may process all words to find anagrams in a single linear pass like
URL srcURL = new URL("http://wiki.puzzlers.org/pub/wordlists/unixdict.txt");
try(InputStream is = srcURL.openStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
Stream<String> stream = reader.lines()) {
stream.collect(Collectors.groupingBy(s -> getAnagramKey(s)))
.values().stream()
.filter(l -> l.size() > 1)
.forEach(System.out::println);
}
With this solution, the printing likely becomes the more expensive part for larger word lists. So you might change the stream’s operation, e.g. the following prints the top ten of anagram combinations:
stream.collect(Collectors.groupingBy(s -> getAnagramKey(s)))
.values().stream()
.filter(l -> l.size() > 1)
.sorted(Collections.reverseOrder(Comparator.comparingInt(List::size)))
.limit(10)
.forEach(System.out::println);