Trying to find a good regex for sentence end detection in java. The main issue is if there is a number then period, it detects it as a sentence end (see demo link). But in my case, I'd prefer it to not recognize that as a sentence end, even though in some cases it might be. What I see in documents more commonly are section headers which look like :
12. the end of the world 13. world didnt end 14. nope it did
In my case it's splitting up a lot of simple header listings into sentences which I don't want.
addition issue with solution posted here:
The proposed solution is: [!?.]+(?=$|\s)
See demo: http://regex101.com/r/lS5tT3/15
The issue is if there is a chapter heading such as 15. then it sees it incorrectly as a sentence end. try this text in the demo and you will see the issue in the first sentence :
This is the f!!rst *15.* the best sentence! Is this the second one? The third 32.5 sentence is here... And the fourth one!!
If there are any regex whizzes who can help add logically that it is not a sentence end if period space but preceeded by a number that would be quite helpful