0

guys. I am a novice at Python 3.8. I need to split a text into sentences while keeping the punctuations. You can also encounter countinuous punctuations. I haven't learnt regular expression so is there a way using simple codes like find() and slicing strings to do this? I tried find() and slices but it is not a one-size-fits-all code. Looking forward to better ways of using find() and slice. Thanks.

Young
  • 1
  • 1
  • 1
    Does this answer your question? [How can I split a text into sentences?](https://stackoverflow.com/questions/4576077/how-can-i-split-a-text-into-sentences) – FinleyGibson Dec 01 '20 at 12:19
  • No, but thanks.:) I have already handed in my homework. Waiting for our teacher's feedback. Maybe this will be easier for me after learning regular expression. Thanks a loooooooot! – Young Dec 02 '20 at 00:39

2 Answers2

2

You can use split('. ') to split the string into several list elements which are split by .

And in order to maintain the same punctuations append . to all list element.

>>> text = "guys. I am a novice at Python 3.8. I need to split a text into sentences while keeping the punctuations. You can also encounter countinuous punctuations. I haven't learnt regular expression so is there a way using simple codes like find() and slicing strings to do this? I tried find() and slices but it is not a one-size-fits-all code. Looking forward to better ways of using find() and slice. Thanks."
>>> sentences = [f'{i}. ' for i in text.split('. ')]
>>> sentences
['guys. ', 'I am a novice at Python 3.8. ', 'I need to split a text into sentences while keeping the punctuations. ', 'You can also encounter countinuous punctuations. ', "I haven't learnt regular expression so is there a way using simple codes like find() and slicing strings to do this? I tried find() and slices but it is not a one-size-fits-all code. ", 'Looking forward to better ways of using find() and slice. ', 'Thanks.. ']
theWellHopeErr
  • 1,856
  • 7
  • 22
  • Yup, yeah, this works. It isn't great for large blocks of text. +1 – Tim Dec 01 '20 at 07:07
  • :) Many thanks! I should have refreshed this page earlier. But if puncts = set(';。!…?') , these kinds of punctuations at the end of a sentence? – Young Dec 01 '20 at 07:49
1

If you have a large block of text, you will likely want to use an generator so that you are not copying a bunch of times.

For example:

import re

paragraph = """
This is a sentence. This may be another one. I am not sure.
"""

sentence_regex = r'[^.]+.'
# match one or more not periods, followed by a period


def find_sentences(text):
    for match in re.finditer(sentence_regex, text):
        yield match.group(0).strip()


for sentence in find_sentences(paragraph):
    print(sentence)

Execution:

[ttucker@zim stackoverflow]$ python sentence.py 
This is a sentence.
This may be another one.
I am not sure.
Tim
  • 2,139
  • 13
  • 18
  • Yep, this is where the problem lies. We are gonna learn RE next class. T T but thanks a lot! You made my day. – Young Dec 01 '20 at 07:49