0

I have this problem to solve

There is an input word from user which has been formed from two different words like

AppleCake or BrownPie

Now we need to develop a program which will take this input and match it against a library of words and break the word into it's meaningful parts i-e Apple and Cake

Input:AppleCake

Output:This input has two words Apple and Cake

Input: RedGrapesWine

Output: This Input has three words Red, Grapes and Wine

My question is:

How should I start working on this problem?

Can anyone help me with pseudoCode/ Steps towards its solution?

Dave Newton
  • 158,873
  • 26
  • 254
  • 302
Just_another_developer
  • 5,737
  • 12
  • 50
  • 83

4 Answers4

1

A very simple approach that works only if you have little number of words is to iterate through the words list and try to match word by word.

This is a very basic example (does not handle case, nor multiple occurrences of word or whatever), but it shows you how to do:

String input = readFromUser();
String[] dictionary = new String[] { "Apple", "Cake" };
List<String> found = new ArrayList<>();
for (String word : dictionary) {
    int index = input.indexOf(word);
    if (index >= 0) {
        input = input.substring(0, index) + input.substring(index + word.length());
        found.add(word);
    }
}
System.out.println("Found " + found.size() + " words: " + found);

This is very simple approach since its time consuming.

Another approach would be to have a Trie and navigate it until you find the right word (should be a better approach).

mkhelif
  • 1,551
  • 10
  • 18
1

To improve the algorithm, you should first create a set containig all the word-beginings your dictionary contains. If "Apple" and "Cake" are in the dictionary, the set has to contain "A", "Ap", "App", "Appl", "Apple", "C", "Ca" and "Cake".

So you will see sooner if a token cannot be a word, as it begining do not matches with the begining of a known word.

Olivier Faucheux
  • 2,520
  • 3
  • 29
  • 37
0

If new words are using capital letters, you can use it to break word into pieces you want.

Tom11
  • 2,419
  • 8
  • 30
  • 56
0

A simple solution will be to test every possible partition against a hashmap/dictionary.

e.g.

thebody -> t hebody (do t and hebody exist?), th ebody (th and ebody?), the body (the and body?), etc.

james.haggerty
  • 1,170
  • 6
  • 10