0

I'm trying to split a paragraph into sentences. At the moment I'm splitting by . which works fine but I can't seem to get it to split correctly when there's either . or ? or !

So far my code is:

String[] sentences = everything.split("(?<=[a-z])\\.\\s+");

Thanks

dave
  • 11,641
  • 5
  • 47
  • 65
magna_nz
  • 1,243
  • 5
  • 23
  • 42

2 Answers2

2

If you don't want to remove ., !, ? from the results.

    String[] sentences = everything.split("(?<=[a-z][!?.])\\s+"); 
0

Use a character class, and you don't need the look behind - use a word boundary instead:

String[] sentences = everything.split("\\b[.!?]\\s+");

"[.!?]" means "either ., ! or ?". The word boundary \b requires that a word character precede the end of sentence char.

Bohemian
  • 412,405
  • 93
  • 575
  • 722