1

Question:

Given an input string and a dictionary of words, find out if the input string can be segmented into a space-separated sequence of dictionary words. See following examples for more details.

Consider the following dictionary 
{ i, like, sam, sung, samsung, mobile, ice, 
  cream, icecream, man, go, mango}

Input:  ilike
Output: Yes 
The string can be segmented as "i like".

Input:  ilikesamsung
Output: Yes
The string can be segmented as "i like samsung" or 
"i like sam sung".

I have tried the following recursive approach:

from timeit import timeit

def wordBreak(wordList, word):
    word = word.replace(" ", "")
    if word == '':
        return True
    else:
        wordLen = len(word)
        return any([(word[:i] in wordList)
                    and wordBreak(wordList, word[i:]) for i in range(1, wordLen+1)])

wordList = ["the", "quick", "fox", "brown"]
word = "the quick brown fox"

print(wordBreak(wordList,word))
print(timeit(lambda: wordBreak(wordList,word))) #12.690028140999999

I also tried using a Trie, which turned out to be way slower after benchmarking the run. Is there an iterative / OOP way of solving the same? Also is my current solution: O(n*(n-1)) in terms of time complexity?

nerdyGuy
  • 306
  • 1
  • 9
  • 1
    I don't think there is a solution that is substantially better. There's no way to solve this other than slogging through the string one character at a time. This is actually O(N!), isn't it? In each recursive step, you need to run through the entire remainder of the list. – Tim Roberts Apr 26 '21 at 01:46
  • Technically speaking, any recursive algorithm can be [reimplemented as iteration](https://stackoverflow.com/questions/159590/way-to-go-from-recursion-to-iteration); you just have to loop and manually handle all the state. I feel like it would be an inelegant and hard-to-read mess in this case, though. One tip: replacing the list comprehension with a [generator expression](https://realpython.com/introduction-to-python-generators/#building-generators-with-generator-expressions) takes up less memory and will usually run at least somewhat faster. – CrazyChucky Apr 26 '21 at 02:12

2 Answers2

1

I didn't see it at first, but there's a very similar way to do this without doing it one letter a time. At each recursion, check if you can remove an entire word at a time off the front of the string, and then just keep going. In an initial test or two, it appears to run a good bit faster.

I think this is the first time I've used the count argument to str.replace.

def word_break(word_list, text):
    text = text.replace(' ', '')
    
    if text == '':
        return True
    
    return any(
        text.startswith(word)
        and word_break(word_list, text.replace(word, '', 1))
        for word in word_list
    )

If you're using Python 3.9+, you can replace text.replace(word, '', 1) with text.removeprefix(word).

I believe it's the same asymptotic complexity, but with a smaller constant (unless the words in your allowed list are all single characters, anyway).

CrazyChucky
  • 3,263
  • 4
  • 11
  • 25
-2

I think the best way to go about this is to use a for-loop in this way:

def wordBreak(wordList, word):
    word = word.replace(" ", "")
    if word == "":
        return True
    #No need for an else statement
    llist = []
    words = []
    for i in range(len(word)):
        llist.append(word[i])
        for j in range(len(llist)):
            if i != j:
                llist[j] += word[i]
            if llist[j] in wordList:
                #print(llist[j])
                words.append(llist[j])
    return words
                
wordList = ["the", "quick", "fox", "brown"]
word = "the quick brown fox"

print(wordBreak(wordList,word))

Although it is a bit lengthier than your original one, it runs much quicker.

Timmy Diehl
  • 409
  • 3
  • 16