How to accurately split sentences in python? I tried nltk but it did not work on some sentences. It fails to split sentences with parenthesis and citations correctly.
import nltk.data
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
paragraph_text = 'Fans often re-watch films and may misidentify actors, so it is essential to pay close attention to details to avoid confusion! In addition to her other notable works, Raquel Welch starred in films such as Fathom (1967), Bandolero! (1968), 100 Rifles (1969), and Myra Breckinridge (1970).'
sentences = tokenizer.tokenize(paragraph_text)
print(sentences)
My code's output:
['Fans often re-watch films and may misidentify actors, so it is essential to pay close attention to details to avoid confusion!', 'In addition to her other notable works, Raquel Welch starred in films such as Fathom (1967), Bandolero!', '(1968), 100 Rifles (1969), and Myra Breckinridge (1970).']
My desired output:
['Fans often re-watch films and may misidentify actors, so it is essential to pay close attention to details to avoid confusion!', 'In addition to her other notable works, Raquel Welch starred in films such as Fathom (1967), Bandolero! (1968), 100 Rifles (1969), and Myra Breckinridge (1970).']