0

I have a dictionary with many words. And i hope search the longest concatenated word (that is, the longest word that is comprised entirely of shorter words in the file). I give the method a descending word from their length. How can I check that all the symbols have been used from the dictionary?

 public boolean tryMatch(String s, List dictionary) {
    String nextWord = new String();
    int contaned = 0;

    //Цикл перебирающий каждое слово словаря
        for(int i = 1; i < dictionary.size();i++) {

            nextWord = (String) dictionary.get(i);
            if (nextWord == s) {
                nextWord = (String) dictionary.get(i + 1);
            }

            if (s.contains(nextWord)) {

                contaned++;
            }

        }

    if(contaned >1) {
        return true;
    }
    return false;
}
  • 3
    Don't compare strings with `==`. – Marvin Jan 12 '18 at 22:21
  • 1
    Some background information to @Marvin's comment: [How do I compare strings in Java?](https://stackoverflow.com/questions/513832/how-do-i-compare-strings-in-java) (this is not only valid for `String`s, but for any kind of object in Java). – Turing85 Jan 12 '18 at 22:33

1 Answers1

1

If you have a sorted list of words, finding compound words is easy, but it will only perform well if the words are in a Set.

Let's look at the compound word football, and of course assume that both ball and foot are in the work list.

By definition, any compound word using foot as the first sub-word must start with foot.

So, when iterating the list, remember the current active "stem" words, e.g. when seeing foot, remember it.

Now, when seeing football, you check if the word starts with the stem word. If not, clear the stem word, and make new word the stem word.

If it does, the new word (football) is a candidate for being a compound word. The part after the stem is ball, so we need to check if that is a word, and if so, we found a compound word.

Checking is easy for simple case, i.e. wordSet.contains(remain).

However, compound words can be made up of more than 2 words, e.g. whatsoever. So after finding that it is a candidate from the stem word what, the remain is soever.

You can simply try all lengths of that (soever, soeve, soev, soe, so, s), and if one of the shorter ones are words, you repeat the process.

Andreas
  • 154,647
  • 11
  • 152
  • 247