1

I need to match an exact substring in a string in Java. I've tried with

String pattern = "\\b"+subItem+"\\b";

But it doesn't work if my substring contains non alphanumerical characters. I want this to work exactly as the "Match whole word only" function in Notepad++. Could you help?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • What can `subItem` look like when "it does not work"? What do you consider a word? – Wiktor Stribiżew Jun 21 '21 at 12:02
  • Any combination of alphanumerical and non alphanumerical characters. Like: Data[3] or pst->numberOfTx. – Robert Ilin Jun 21 '21 at 12:04
  • 1
    So, any non-whitespace? Does `String pattern = "(?<!\\S)"+Pattern.quote(subItem)+"(?!\\S)";` solve the issue? – Wiktor Stribiżew Jun 21 '21 at 12:05
  • On this string: Data[3]=(uint8)Data[1]; It doesn't work for searching Data[3] – Robert Ilin Jun 21 '21 at 12:08
  • 1
    Ok, if the words with special chars at the start/end are not to be glued to word chars, try `String pattern = "(?<!\\w)"+Pattern.quote(subItem)+"(?!\\w)";` Else, you will need to build word boundaries by checking the first/final chars in the `subItem` – Wiktor Stribiżew Jun 21 '21 at 12:09
  • My guess is your `subItem` content isn't escaped. So when it is `Data[3]`, it looks for `Data3` instead of `Data[3]`, because `[` and `]` are special characters in Regex. – Nino DELCEY Jun 21 '21 at 12:15
  • Does this answer your question? [How to escape text for regular expression in Java](https://stackoverflow.com/questions/60160/how-to-escape-text-for-regular-expression-in-java) – tevemadar Jun 21 '21 at 12:16
  • It might be more complex than just replacing `subItem` with `Pattern.quote(subItem)` – Wiktor Stribiżew Jun 21 '21 at 12:24
  • @WiktorStribiżew what's the difference between you last comment and the first one? Because the first one seems to work fine. I mean `String pattern = "(?<!\\w)"+Pattern.quote(subItem)+"(?!\\w)";` – Robert Ilin Jun 21 '21 at 12:57
  • 1
    @RobertIlin The `(?<!\w)` and `(?!\w)` might fail to find `[Data]` in `text[Data]text`. I understood you need to match these cases. – Wiktor Stribiżew Jun 21 '21 at 12:58

1 Answers1

1

I suggest either unambigous word boundaries (that match a string only if the search pattern is not enclosed with letters, digits or underscores):

String pattern = "(?<!\\w)"+Pattern.quote(subItem)+"(?!\\w)";

where (?<!\w) matches a location not preceded with a word char and (?!\w) fails if there is no word char immediately after the current position (see this regex demo), or, you can use a variation that takes into account leading/trailing special chars of the potential match:

String pattern = "(?:\\B(?!\\w)|\\b(?=\\w))" + Pattern.quote(subword) + "(?:(?<=\\w)\\b|(?<!\\w)\\B)";

See the regex demo.

Details:

  • (?:\B(?!\w)|\b(?=\w)) - either a non-word boundary if the next char is not a word char, or a word boundary if the next char is a word char
  • Data\[3\] - this is a quoted subItem
  • (?:(?<=\w)\b|(?<!\w)\B) - either a word boundary if the preceding char is a word char, or a non-word boundary if the preceding char is not a word char.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563