1

I would like to parse sentences from text. I do not want to use NLP libraries, as they probably not support my language.

My idea is something like this:

sentence_begin = space_AnyCapitalLetter
sentence_middle = minimum_5_letters
sencence_end = ". " or "? " or "! "

sentece = sencente_begin + sentence_middle + sentence_end

Unfortunatelly I do not have any idea how to make the working code..

Other idea I have, is to use .split function, like this

x = any capitall letter                (don´t know how to set)
text.split(". x" or "? x" or "! x")    (don´t know how to set more options for split, as "or" probably not work)

Thanks for any help. Or maybe this is completely wrong approach, so I will be happy for any other suggestion.

  • 1
    Does this answer your question? [Split Strings into words with multiple word boundary delimiters](https://stackoverflow.com/questions/1059559/split-strings-into-words-with-multiple-word-boundary-delimiters) – Python learner Jan 09 '22 at 08:27
  • I found solution in using Regex, like here: https://stackoverflow.com/questions/25735644/python-regex-for-splitting-text-into-sentences-sentence-tokenizing – user3306642 Jan 18 '22 at 08:39

1 Answers1

0

You can use the replace method. string.replace(oldvalue, newvalue)

Replacing the character with the same character + a new line \n.

So if we want to add a new line to ? then we would use replace('?', '?\n')

sentences = 'Sentece1 Sentece1 Sentece1 Sentece1.Sentece2 Sentece2 Sentece2 Sentece2?Sentece3 Sentece3 Sentece3 Sentece3!'
new = sentences.replace('?', '?\n').replace('.', '.\n').replace('!', '!\n')
print(new)

Results in:

Sentece1 Sentece1 Sentece1 Sentece1.
Sentece2 Sentece2 Sentece2 Sentece2?
Sentece3 Sentece3 Sentece3 Sentece3!

If you want to just add a new line to the sentences and not keep the characters we just need to remove the character and just keep \n.

sentences = 'Sentece1 Sentece1 Sentece1 Sentece1.Sentece2 Sentece2 Sentece2 Sentece2?Sentece3 Sentece3 Sentece3 Sentece3!'
new = sentences.replace('?', '\n').replace('.', '\n').replace('!', '\n')
print(new)

Results in:

Sentece1 Sentece1 Sentece1 Sentece1
Sentece2 Sentece2 Sentece2 Sentece2
Sentece3 Sentece3 Sentece3 Sentece3
infin1tum
  • 1
  • 1
  • Thanks, but I actually need to make "sentece" only when all conditions are met.. So it must begin with capital letter, be at least 5 characters long, and must end by ! or ? or . – user3306642 Jan 18 '22 at 08:38