Using a user inputted string of characters find the longest word that can be made

Question

Basically I want to create a program which simulates the 'Countdown' game on Channel 4. In effect a user must input 9 letters and the program will search for the largest word in the dictionary that can be made from these letters.I think a tree structure would be better to go with rather than hash tables. I already have a file which contains the words in the dictionary and will be using file io.

This is my file io class:

public static void main(String[] args){
     FileIO reader = new FileIO();
     String[] contents = reader.load("dictionary.txt");
}

This is what I have so far in my Countdown class

public static void main(String[] args) throws IOException{
     Scanner scan = new Scanner(System.in);
     letters = scan.NextLine();
}

I get totally lost from here. I know this is only the start but I'm not looking for answers. I just want a small bit of help and maybe a pointer in the right direction. I'm only new to java and found this question in an interview book and thought I should give it a .

Thanks in advance

I think you can use a [suffix tree](http://en.wikipedia.org/wiki/Suffix_tree) and a [string search](http://en.wikipedia.org/wiki/String_search#Index_methods) to do that. — Elliott Frisch, Mar 03 '14 at 20:15
Just to clarify - is it legitimate to have the same letter twice in a word, if it only appears once in your set of nine? — Dawood ibn Kareem, Mar 03 '14 at 21:19
You've picked a rather difficult problem to get started on, especially if you're talking about a real dictionary (i.e. hundreds of thousands of words). This is typically done with something called a Directed Acyclic Word Graph (DAWG), which is a reasonably advanced topic. — Jim Mischel, Mar 03 '14 at 23:19

score 0 · Answer 1 · answered Mar 03 '14 at 20:13

welcome to the world of Java :)

The first thing I see there that you have two main methods, you don't actually need that. Your program will have a single entry point in most cases then it does all its logic and handles user input and everything.

You're thinking of a tree structure which is good, though there might be a better idea to store this. Try this: http://en.wikipedia.org/wiki/Trie

What your program has to do is read all the words from the file line by line, and in this process build your data structure, the tree. When that's done you can ask the user for input and after the input is entered you can search the tree.

Since you asked specifically not to provide answers I won't put code here, but feel free to ask if you're unclear about something

I was going to suggest Trie as well, with each node holding a character from the words in the dictionary. Your program could then traverse the nodes according to which letters were not yet used at each step. — Mar, Mar 03 '14 at 20:16
A specific kind of trie, the Directed Acyclic Word Graph, is a well-known data structure for this type of problem. — Jim Mischel, Mar 03 '14 at 23:20

score 0 · Answer 2 · answered Mar 03 '14 at 20:16

0

There are only about 800,000 words in the English language, so an efficient solution would be to store those 800,000 words as 800,000 arrays of 26 1-byte integers that count how many times each letter is used in the word, and then for an input 9 characters you convert to similar 26 integer count format for the query, and then a word can be formed from the query letters if the query vector is greater than or equal to the word-vector component-wise. You could easily process on the order of 100 queries per second this way.

answered Mar 03 '14 at 20:16

user2566092

4,631
15
20

That doesn't sound terribly efficient. Unless I'm missing something. Seems like a search would be at least O(n), and for each word you'd have to check a lot of combinations of letters. Can you explain a bit how searching would work? – Jim Mischel Mar 03 '14 at 23:18
@JimMischel For each word, you don't have to check a bunch of combinations of letters, you just compare the letter-count vector for the word to the letter-count vector for the query letters. If the query letters gives letter counts greater than or equal to the word's letter counts, then the word can be formed from the letters in the query. Total complexity per query is O(nA) for n words and an alphabet of size A. Should be able to do a hundred queries per second or more. – user2566092 Mar 04 '14 at 18:32
An O(nA) search is pretty expensive. There are much more efficient ways to do this. – Jim Mischel Mar 04 '14 at 18:56
@JimMischel n = 800k and A = 26 and the hidden constants are very low so this will enable hundreds of queries per second. OP is asking about the actual english language, not some hypothetical language with huge n and A, and this will enable hundreds of queries per second. If OP wants it faster, it should be stated; otherwise this seems like a perfectly good solution albeit simple. Actually when simple is good enough, simple is preferred imo. – user2566092 Mar 05 '14 at 22:09

score 0 · Answer 3 · answered Mar 03 '14 at 20:19

I would write a program that starts with all the two-letter words, then does the three-letter words, the four-letter words and so on.

When you do the two-letter words, you'll want some way of picking the first letter, then picking the second letter from what remains. You'll probably want to use recursion for this part. Lastly, you'll check it against the dictionary. Try to write it in a way that means you can re-use the same code for the three-letter words.

dnl-blkv · Answer 4 · 2014-03-03T22:16:50.957

0

I believe, the power of Regular Expressions would come in handy in your case:

1) Create a regular expression string with a symbol class like: /^[abcdefghi]*$/ with your letters inside instead of "abcdefghi".

2) Use that regular expression as a filter to get a strings array from your text file.

3) Sort it by length. The longest word is what you need!

Check the Regular Expressions Reference for more information.

UPD: Here is a good Java Regex Tutorial.

edited Mar 03 '14 at 22:16

answered Mar 03 '14 at 20:19

dnl-blkv

2,037
1
17
22

`/^[abcdefghi]*$/` will match everything, even words that don't contain those letters. You'll need to replace that `*` with `+`. Even so, your method has to look at every word. Not going to be very efficient if you want to do hundreds of lookups per second. – Jim Mischel Mar 04 '14 at 04:56

score 0 · Answer 5 · edited May 23 '17 at 11:49

A first approach could be using a tree with all the letters present in the wordlist.

If one node is the end of a word, then is marked as an end-of-word node.

Tree

In the picture above, the longest word is banana. But there are other words, like ball, ban, or banal.

So, a node must have:

A character
If it is the end of a word
A list of children. (max 26)

The insertion algorithm is very simple: In each step we "cut" the first character of the word until the word has no more characters.

public class TreeNode {

    public char c;
    private boolean isEndOfWord = false;
    private TreeNode[] children = new TreeNode[26];

    public TreeNode(char c) {
        this.c = c;
    }

    public void put(String s) {
        if (s.isEmpty())
        {
            this.isEndOfWord = true;
            return;
        }
        char first = s.charAt(0);
        int pos = position(first);
        if (this.children[pos] == null)
            this.children[pos] = new TreeNode(first);

        this.children[pos].put(s.substring(1));
    }

    public String search(char[] letters) {
        String word = "";
        String w = "";

        for (int i = 0; i < letters.length; i++)
        {
            TreeNode child = children[position(letters[i])];
            if (child != null)
                w = child.search(letters);
               //this is not efficient. It should be optimized.
            if (w.contains("%")
                    && w.substring(0, w.lastIndexOf("%")).length() > word
                            .length())
                word = w;
        }
            // if a node its end-of-word we add the special char '%'
        return c + (this.isEndOfWord ? "%" : "") + word;
    }
    //if 'a' returns 0, if 'b' returns 1...etc
    public static int position(char c) {
        return ((byte) c) - 97;
    }


}

Example:

public static void main(String[] args) {
    //root
    TreeNode t = new TreeNode('R');
    //for skipping words with "'" in the wordlist
    Pattern p = Pattern.compile(".*\\W+.*");
    int nw = 0;
    try (BufferedReader br = new BufferedReader(new FileReader(
            "files/wordsEn.txt")))
    {
        for (String line; (line = br.readLine()) != null;)
        {
            if (p.matcher(line).find())
                continue;
            t.put(line);
            nw++;
        }
        // line is not visible here.
        br.close();
        System.out.println("number of words : " + nw);
        String res = null;
        // substring (1) because of the root
        res = t.search("vuetsrcanoli".toCharArray()).substring(1);
        System.out.println(res.replace("%", ""));
    }

    catch (Exception e)
    {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

}

Output:

number of words : 109563
counterrevolutionaries

Notes:

The wordlist is taken from here
the reading part is based on another SO question : How to read a large text file line by line using Java?

Using a user inputted string of characters find the longest word that can be made

5 Answers5