2

I need to see if a whole word exists in a string. This is how I try to do it:

if(text.matches(".*\\" + word + "\\b.*"))
    // do something

It's running for most words, but words that start with a g cause an error:

Exception in thread "main" java.util.regex.PatternSyntaxException:
Illegal/unsupported escape sequence near index 3 
.*\great life\b.*
   ^

How can I fix this?

Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142
Eddy
  • 3,533
  • 13
  • 59
  • 89

3 Answers3

4

The actual reason for the error is that you cannot escape an alphabetical character in a Java regex pattern that does not form a valid escape construct.

See Java regex documentation:

It is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct; these are reserved for future extensions to the regular-expression language. A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct.

I'd use

Matcher m = Pattern.compile("\\b" + word + "\\b").matcher(text);
if (m.find()) {
    // A match is found
}

If a word may contain/start/end with special chars, I'd use

Matcher m = Pattern.compile("(?!\\B\\w)" + Pattern.quote(word) + "(?<!\\w\\B)").matcher(text);
if (m.find()) {
    // A match is found
}
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
2

Using ".*\\" + word + "\\b.*" with word = great life will generate the string ".*\\great life\\b.*" which, as a value is .*\great life\b.*. The issue is that \g does not belong to the list of the escape sequences in JAVA (see What are all the escape characters in Java?)

You can use

if(text.matches(".*\\b" + word + "\\b.*"))
                     ^
Community
  • 1
  • 1
Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142
  • `String.matches()` is a bit misleading because it tries to match the *entire string* against the regex. Like it would wrap the regex into `^...$` So the `.*` is important in this case. – Tamas Rev Apr 25 '17 at 16:50
  • @TamasRev thans for the clarification, I've edited accordingly – Thomas Ayoub Apr 25 '17 at 16:58
  • 1
    Actually, the `\g` error has nothing to do with the escape sequences of Java; the problem is that it’s not a valid escape sequence in [Java’s regular expression language.](http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html) – VGR Apr 25 '17 at 17:33
2

The \\ thing proceeded by whatever character will be a interpreted as a metacharacter. E.g. ".*\\geza\\b.*" will try to find the \g escape sequence, ".*\\jani\\b.*" will try to find \j, etc.

Some of these sequences exist, others don't, you can check the Pattern docs for details. What's really troubling is that probably this isn't what you want.

I agree with Thomas Ayoub that probably you need to match \\b...\\b to find a word. I would go one step further and I'd use Pattern.quote to avoid unintended regex features that might come from word:

String text = "Lorem Ipsum a[asd]a sad";
String word = "a[asd]a";
if (text.matches(".*\\b" + Pattern.quote(word) + "\\b.*")) {
    // do something
}
Community
  • 1
  • 1
Tamas Rev
  • 7,008
  • 5
  • 32
  • 49
  • I don't need the Pattern.quote, it works without it. I'm accepting this answer because of the explanation. Thanks. – Eddy Apr 26 '17 at 12:55
  • @Eddy However, the solution may turn out much slower if the input string is quite long, as the first `.*` will make the regex engine grab the whole *line* first, and then backtracking will occur trying to find the subsequent patterns. Also, mind that `\b` is a context-dependent pattern, and `".*\\b\\.NET\\b.*"` won't match `.NET` in `This is .NET`. Another downside of this solution is that it won't find a match if the input string has line breaks. [My solution](https://stackoverflow.com/a/43616524/3832970) will. – Wiktor Stribiżew Feb 05 '19 at 07:57