0

I am trying to implement a little game with the following rules: Given a set of random letters (e.g. 10), I want to find all possible words one can form out of these letters. I am using a standard dictionary for this.

Letters are allowed to be used multiple times, and not all letters have to be used, as long it results in words with 4 characters or more. I think this is similar to solving anagrams, except that letters are allowed to be used multiple times.

E.g. Letters given: q r b d t e s Possible words: bedders, dessert, etc.

In search of a data structure supporting O(1) for checking if a proposed word is in the dictionary, I found this paper and subsequently also found a working Java DAWG implementation here.

Here's where I am stuck: When trying to generate a list of possible words one can create from these letters, I am wondering if I am missing something using the DAWG implementation. I have seen other threads on here that are suggestion using a Trie and recursively go through the nodes, but I was hoping I could do something similar with the DAWG.

The implementation I currently have working is by going through all the words of the dictionary, then skip any word that contains a letter that is NOT part of the letters earlier assigned. This works, but is slow, going through all the words of the dictionary * ~20 letters worst-case.

for(String word : dawg.getAllStrings()) {
     boolean blacklist = false;
     for(int i = 0; i<nonLetters.length(); i++) {
         if(word.indexOf(nonLetters.charAt(i)) >= 0) {
             blacklist = true;
             break;
         }
     }

     if(!blacklist)
         possibleWordList.add(word);
}

I tried implementing a recursive word match but struggled as the implementation doesn't expose access to its Node implementation, I could locally change this though.

Without access to the node, I can use dawg.contains(letter), but with the requirement to use letters multiple times I am wondering if this would really help. E.g. I would start with 'A', then 'AA', ... stopping at e.g. 20 A's.

Would all this just be easier with a Trie? Still fairly fast to find matching words, but easier to generate a list of possible words?

Are there other DAWG libraries that expose Node traversing or have a ref implementation to generate all possible words?

Thanks for any help or pointers!

Nimo
  • 61
  • 6

2 Answers2

1

I think this is a good way. I added a recursive method to ModifiableDAWGSet of the DAWG implementation referenced in the question.

getWords is called with a String prefix, adding up the characters. I first implemented this to generate all words in the DAWG and compared to the existing method of the ModifiableDAWGSet.getAllStrings(). Then I added to skip words that didn't contain words, not including Characters from the list of possible characters.

private ArrayList<String> allMatchingWords = new ArrayList<String>();
private ArrayList<Character> possibleCharacters;

private void getWords(ModifiableDAWGNode originNode, String prefix) {
    NavigableMap<Character, ModifiableDAWGNode> transitionTreeMap = originNode.getOutgoingTransitions();

    for(Map.Entry<Character, ModifiableDAWGNode> transition : transitionTreeMap.entrySet())
    {
        Character c = transition.getKey();
        if(!possibleCharacters.contains(c))
            continue;

        ModifiableDAWGNode n = transition.getValue();
        if(n.isAcceptNode()) //this is a word
        {
            allMatchingWords.add(prefix + c);
        }
        getWords(n, prefix + c);
    }
}

public ArrayList<String> getAllMatchingWords(Character mustContain, ArrayList<Character> possibleCharacters)
{
    allMatchingWords.clear();
    this.mustContain = mustContain;
    this.possibleCharacters = possibleCharacters;
    getWords(sourceNode, "");
    return allMatchingWords;
}
Nimo
  • 61
  • 6
  • Saw this later. Get the original Project from the Github link above and add the code block to ModifiableDAWGSet. – Nimo May 17 '20 at 17:40
0

I have an idea, I am not sure but hope to help you. First create the dictionary as a work-key, the key will be all the letters that the word contains in alphabetical order. Ex: classic -> acils , letter -> elrt. Similar for random string. You can prepare this for your program. And get everything you need to browse through a dictionary with O (n) complexity

for(Word word : dawg.getAllStrings()){
    if(randomString.contains(word.getKey()))
    possibleWordList.add(word);
}
  • Nice! Thank you, I will give this a try. Shortly after I posted I saw that the DAWG library I am using also has a Map implementation with key/values, so I'll check if I can use that. – Nimo Apr 29 '20 at 18:20
  • Wish you success – Lê Hoàng Dững Apr 29 '20 at 18:24
  • Quick update, unfortunately the main issue in terms of performance seems to be going through the dictionary itself to determine in any way the matching word. It doesn't seem to make much difference if this is done by looking for non-matching letters or checking if the sorted string matches a sorted word. I will look again at some other ideas (Trie seems to be easier here or try to traverse the DAWG in some way). Thanks again for now! – Nimo Apr 30 '20 at 16:23
  • I posted an answer below that is running recursively through the nodes of the DAWG – Nimo May 02 '20 at 08:44