-2

I need to tokenize a sentence without using regex nor any imported module, but with the built-in split() function. The function should take a text as input and returns a list that contains the sentences in the text, delimited by '?', '!' and '.' An example would be:

>>> t = "Are you out of your mind? I can't believe it! I'm so disappointed."
>>> get_sentences(t)
['Are you out of your mind', 'I can't believe it', 'I'm so disappointed']

Here is my work so far:

def get_sentences(text):
    l1 = text.split('.')
    for l2 in l1:
        l2 = l2.split('!')
        for l3 in l2:
            l3 = l3.split('?')
    return l1

Any help, please?

1 Answers1

-1

One way to solve the problem is to split the text progressively using one separator at a time and then combine the fragments with either sum() or itertools.chain(). The latter is much faster but requires an external module. The order of the separators does not matter. stripping removes the unwanted whitespaces between the sentences.

sents = sum([sum([[z.strip() for z in y.split("?")] 
                             for y in x.split("!")], []) 
                             for x in t.split(".")], [])

There may be empty-string leftovers in the output. Get rid of them:

sents = [sent for sent in sents if sent]
#['Are you out of your mind', "I can't believe it", 
# "I'm so disappointed"]
DYZ
  • 55,249
  • 10
  • 64
  • 93
  • 2
    You ended up with 4 items; solution should have 3. – Scott Hunter Nov 28 '20 at 23:14
  • thanks, do u know any other way to do that without sum() nor itertools.chain()? (I'm pretty limited in the use of modules and functions...) Is there a way to solve the problem by using loops and splits? – Taha Rhaouti Nov 28 '20 at 23:15
  • `sum()` is a built-in function. What's wrong with it? – DYZ Nov 28 '20 at 23:16
  • If you cannot use the built-in function (you should have warned us), you can write your own based on [this answer](https://stackoverflow.com/a/14278710/4492932). – DYZ Nov 28 '20 at 23:18