1


I'm trying to build up a regular expression which splits a paragraph in sentences separated by a period (.). That should work:

String str[] = text.split("\\.");

However I'd need to add a minimum of robustness, for example checking that the period is followed by a space and an uppercase letter. So here's my next guess:

String text="The pen is on the table. The table has a pen upon it.";
String arr[] = text.split("\\. [A-Z]");

for (String s: arr)
    System.out.println(s);

Output:
The pen is on the table
he table has a pen upon it.

Unfortunately, I'm missing the first character after the period. Can you see any way it can be fixed?

Zabuzard
  • 25,064
  • 8
  • 58
  • 82
Francesco Marchioni
  • 4,091
  • 1
  • 25
  • 40

1 Answers1

4

You can use a lookahead to see what is coming next in the string.

text.split("\\. (?=[A-Z])");
{ "The pen is on the table", "The table has a pen upon it." }

If you want to keep the periods as well, you can also use a lookbehind:

text.split("(?<=\\.) (?=[A-Z])");
{ "The pen is on the table.", "The table has a pen upon it." }
khelwood
  • 55,782
  • 14
  • 81
  • 108