7

I have this regex which is supposed to remove sentence delimiters(. and ?):

sentence = sentence.replaceAll("\\.|\\?$","");

It works fine it converts

"I am Java developer." to "I am Java developer"

"Am I a Java developer?" to "Am I a Java developer"

But after deployment we found that it also replaces any other dots in the sentence as

"Hi.Am I a Java developer?" becomes "HiAm I a Java developer"

Why is this happening?

codaddict
  • 445,704
  • 82
  • 492
  • 529
user489849
  • 73
  • 3

4 Answers4

15

The pipe (|) has the lowest precedence of all operators. So your regex:

\\.|\\?$

is being treated as:

(\\.)|(\\?$)

which matches a . anywhere in the string and matches a ? at the end of the string.

To fix this you need to group the . and ? together as:

(?:\\.|\\?)$

You could also use:

[.?]$

Within a character class . and ? are treated literally so you need not escape them.

codaddict
  • 445,704
  • 82
  • 492
  • 529
8

What you're saying with "\\.|\\?$" is "either a period" or "a question mark as the last character".

I would recommend "[.?]$" instead in order to avoid the confusing escaping (and undesirable result, of course).

jensgram
  • 31,109
  • 6
  • 81
  • 98
7

Your problem is because of the low precedence of the alternation operator |. Your regular expression means match one of:

  • . anywhere or
  • ? at the end of a line.

Use a character class instead:

"[.?]$"
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
3

You have forgotten to embrace the sentence-ending characters with round brackets:

sentence = sentence.replaceAll("(\\.|\\?)$","");

The better approach is to use [.?]$ like @Mark Byers suggested.

sentence = sentence.replaceAll("[.?]$","");
splash
  • 13,037
  • 1
  • 44
  • 67