-1

Suppose I have a string like 'meetateight' and I need to segment it into meaningful words like 'meet' 'at' 'eight' using dynamic programming.

To judge how “good” a block/segment "x = x1x2x3" is, I am given a black box that, on input x, returns a real number quality(x) such that: A large positive value for quality(x) indicates x is close to an English word, and a large negative number indicates x is far from an English word.

I need help with designing an algorithm for the same.

I tried thinking over an algorithm in which I would iteratively add letters based on their quality and segment whenever there is a dip in quality. But this fails in the above example because it cuts out me instead of meet.

I need suggestions for a better algorithm.

Thanks

hippietrail
  • 15,848
  • 18
  • 99
  • 158
parth
  • 272
  • 1
  • 9
  • 20
  • possible duplicate of [Split a string to a string of valid words using Dynamic Programming](http://stackoverflow.com/questions/5310756/split-a-string-to-a-string-of-valid-words-using-dynamic-programming) – Woot4Moo Nov 09 '12 at 13:46
  • This question has been asked before. – Woot4Moo Nov 09 '12 at 13:46

3 Answers3

0

What about building a Trie using an English Dictionary and navigating it down scanning your string with all the possible path to leaf (backtracking when you have more than one choice).

Tony Rad
  • 2,479
  • 20
  • 32
0

You can use dynamic programming, and keep track of the score for each prefix of your input, adding one letter at a time. Each time you add a letter, see if any suffixes can be added on to a prefix you've already used (choosing the one with the best score). For example:

m = 0
me = 1
mee = 0
meet = 1
meeta = 1 (meet + a)
meetat = 1 (meet + at)
meetate = 1 (meet + ate)
meetatei = 1 (meetate + i)
meetateig = 0
meetateigh = 0
meetateight = 1 (meetat + eight)

To handle values between 0 and 1, you can multiply them together. Also save which words you've used so you can split the whole string at the end.

fgb
  • 18,439
  • 2
  • 38
  • 52
-1

I wrote a program to do this at my blog; it's too long to include here. The basic idea is to chop off prefixes that form words, then recursively process the rest of the input, backtracking when it is not possible to split the entire string.

user448810
  • 17,381
  • 4
  • 34
  • 59