2

I want to find all possible anagrams from a phrase, for example if I input "Donald Trump" I should get "Darn mud plot", "Damp old runt" and probably hundreds more.

I have a dictionary of around 100,000 words, no problems there.

But the only way I can think of is to loop through the dictionary and add all words that can be built from the input to a list. Then loop through the list and if the word length is less than the length of the input, loop through the dictionary again add all possible words that can be made from the remaining letters that would make it the length of the input or less. And keep looping through until I have all combinations of valid words of length equal to input length.

But this is O(n!) complexity, and it would take almost forever to run. I've tried it.

Is there any way to approach this problem such that the complexity will be less? I may have found something on the net for perl, but I absolutely cannot read perl code, especially not perl golf.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
David Kane
  • 91
  • 10
  • This sorta seems like a http://codereview.stackexchange.com/ question. This also has been asked before on that site as well. Check out http://codereview.stackexchange.com/questions/75023/optimizing-an-anagram-solver – TehTris Nov 21 '16 at 16:34
  • 1
    @TehTris a [codereview.se] question contains real, actual, working code ready to be peer reviewed. This would be off-topic on CR. – Mathieu Guindon Nov 21 '16 at 16:40
  • It looks like a duplicate of this: http://stackoverflow.com/questions/55210/algorithm-to-generate-anagrams – ChatterOne Nov 22 '16 at 07:59

1 Answers1

3

I like your idea of filtering the word list down to just the words that could possibly be made with the input letters, and I like the idea of trying to string them together, but I think there are a few major optimizations you could put into place that would likely speed things up quite a bit.

For starters, rather than choosing a word and then rescanning the entire dictionary for what's left, I'd consider just doing a single filtering pass at the start to find all possible words that could be made with the letters that you have. Your dictionary is likely going to be pretty colossal (150,000+, I'd suspect), so rescanning it after each decision point is going to be completely infeasible. Once you have the set of words you can legally use in the anagram, from there you're left with the problem of finding which combinations of them can be used to form a complete anagram of the sentence.

I'd begin by finding unordered lists of words that anagram to the target rather than all possible ordered lists of words, because there's many fewer of them to find. Once you have the unordered lists, you can generate the permutations from them pretty quickly.

To do this, I'd use a backtracking recursion where at each point you maintain a histogram of the remaining letter counts. You can use that to filter out words that can't be added in any more, and this essentially saves you the cost of having to check the whole dictionary each time. I'd imagine this recursion will dead-end a lot, and that you'll probably find all your answers without too much effort.

You might consider some other heuristics along the way. For example, you might want to start with larger words first to pull out as many letters as possible and keep the branching factor low. To do that, you could sort your word list from longest to shortest and try the words in that order. You could alternatively try to use the most constrained letters up first to decrease the branching factor. These sorts of heuristics will probably work really well in practice.

Overall you're still looking at exponential work in the worst case, but it shouldn't be too bad for shorter strings.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065