0

How to accurately split sentences in python? I tried nltk but it did not work on some sentences. It fails to split sentences with parenthesis and citations correctly.

import nltk.data

tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
paragraph_text = 'Fans often re-watch films and may misidentify actors, so it is essential to pay close attention to details to avoid confusion! In addition to her other notable works, Raquel Welch starred in films such as Fathom (1967), Bandolero! (1968), 100 Rifles (1969), and Myra Breckinridge (1970).'
sentences = tokenizer.tokenize(paragraph_text)
print(sentences)

My code's output:

['Fans often re-watch films and may misidentify actors, so it is essential to pay close attention to details to avoid confusion!', 'In addition to her other notable works, Raquel Welch starred in films such as Fathom (1967), Bandolero!', '(1968), 100 Rifles (1969), and Myra Breckinridge (1970).']

My desired output:

['Fans often re-watch films and may misidentify actors, so it is essential to pay close attention to details to avoid confusion!', 'In addition to her other notable works, Raquel Welch starred in films such as Fathom (1967), Bandolero! (1968), 100 Rifles (1969), and Myra Breckinridge (1970).']
desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • 1
    Your problem is not the parentheses; it's the `!` after `Bandolero` within the sentence. You need a more advanced algorithm, such as an LLM like GPT-3. ChatGPT can correctly split this example. – Selcuk May 11 '23 at 01:27
  • Did you see this question? https://stackoverflow.com/questions/4576077/how-can-i-split-a-text-into-sentences – imad.nyc May 11 '23 at 01:54

0 Answers0