1

In this problem we've to split a string into meaningful words. We're given a dictionary to see If the word exists or not.

I"ve seen some other approaches here at How to split a string into words. Ex: "stringintowords" -> "String Into Words"?.

I thought of a different approach and was wondering If it would work or not.

Example- itlookslikeasentence

Algorithm

Each letter of the string corresponds to a node in a DAG.

Initialize a bool array to False.

At each node we have a choice- If the addition of the present letter to the previous subarray still produces a valid word then add it, if it does not then we will begin a new word from that letter and set bool[previous_node]=True indicating that a word ended there. In the above example bool[1] would be set to true.

This is something similar to the maximum subarray sum problem.

Would this algorithm work?

Community
  • 1
  • 1
Sameer
  • 137
  • 1
  • 9

1 Answers1

2

No, it wouldn't. You solution takes the longest possible word at every step, which doesn't always work.

Here is counterexample:

Let's assume that the given string is aturtle. Your algorithm will take a. Then it will take t as at is valid word. atu is not a word, so it'll split the input: at + urtle. However, there is no way to split urtle into a sequence of valid English words. The right answer would be a + turtle.

One of the possible correct solutions uses dynamic programming. We can define a function f such that f(i) = true iff it's possible to split the first i characters of the input into a valid sequence of words. Initially, f(0) = true and the rest of the values are false. There is a transition from f(l) to f(r) if s[l + 1, r] is a valid word for all valid l and r.

P.S. Other types of greedy algorithms would not work here either. For instance, if you take the shortest word instead of the longest one, it fails to work on, for instance, the input atnight: there is no way to split tnight after the a is stripped off, but at + night is clearly a valid answer.

kraskevich
  • 18,368
  • 4
  • 33
  • 45