2

Say I have the code txt = "Hello my name is bob. I really like pies.", how would I extract each sentence individually and add the to a list. I created this messy script which gives me a number of sentences roughly in a string...

sentences = 0
capitals = [
    'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S',
    'T','U','V','W','X','Y','Z'
]
finish_markers = [
    '.','?','!'
]
newTxt = txt.split()
for x in newTxt[1:-1]:
    for caps in capitals:
        if caps in x:
            for fin in finish_markers:
                if fin in newTxt[newTxt.index(x) - 1]:
                    sentences += 1
for caps in capitals:
    if caps in newTxt[0]:
        sentences += 1
print("Sentence count...")
print(sentences)

It is using the txt variable mentioned above. However I would now like to extract each sentence and put them into a list so the final product would look something like this...

['Hello my name is bob.','I really like pies.']

I would prefer not to use any non standard packages because I want this script to work independent of everything and offline. Thank you for any help!

  • 3
    Split string by `.` ? --> `"Hello my name is bob. I really like pies.".split(".") ` – Rakesh Jul 03 '19 at 11:47
  • 1
    @Rakesh, does not always work. For ex: "This question is tagged python-3.x" - will be split into two. You might want [nltk](http://www.nltk.org/). – Austin Jul 03 '19 at 11:51
  • 1
    Possible duplicate of [Python split text on sentences](https://stackoverflow.com/questions/4576077/python-split-text-on-sentences) – Austin Jul 03 '19 at 11:53
  • @Austin if that can be the case then what could be the approach of identifying that this is the end of the sentence – 0xPrateek Jul 03 '19 at 11:53
  • Thank you, for now I will use the first option by @Rakesh because it is working however I will look into `nltk` but as I said I am trying to avoid using any extra packages. –  Jul 03 '19 at 11:55
  • @0xPrateek, see the link above. – Austin Jul 03 '19 at 11:55
  • Also could be duplicate of https://stackoverflow.com/questions/4998629/split-string-with-multiple-delimiters-in-python – Tomerikoo Jul 03 '19 at 11:57

3 Answers3

0

Use nltk.tokenize

import nltk
sentences = nltk.sent_tokenize(txt) 

This will give you a list of sentences.

pedram bashiri
  • 1,286
  • 15
  • 21
0

You could work with a regex for all the ending chars(".","?","!")and then split it into different string.

0

You are trying to split a string into sentences, that is a bit hard to do it with regular expressions or string functions handling. For your use case, I'd recommend a NLP library like NLTK. Then, take a look at this Tokenize a paragraph into sentence and then into words in NLTK.

Cartucho
  • 3,257
  • 2
  • 30
  • 55