3

I want to match and find index of word either surrounded by space or special characters. For example:

To find: test
this is input test : True
this is#input_ : True
this isinput : False
thisisinputtest: False
this @test is right: True.

How do I match this and find index. My current regex fails: (?i)[^a-zA-Z0-9]test[^a-zA-Z0-9]

Maxsteel
  • 1,922
  • 4
  • 30
  • 55

2 Answers2

5

I think what you need to use lookarounds in your case:

(?<!\p{Alnum})test(?!\p{Alnum})

The negative lookbehind (?<!\p{Alnum}) will fail the match if there is an alphanumeric char present to the left of the test, and the negative lookahead (?!\p{Alnum}) will fail the match if there is an alphanumeric char right after test.

See the testing screenshot:

enter image description here

Java demo:

String str = "this is#test_ :";
Pattern ptrn = Pattern.compile("(?<!\\p{Alnum})test(?!\\p{Alnum})");
Matcher matcher = ptrn.matcher(str);
while (matcher.find()) {
    System.out.println(matcher.start());
}

Alternative way: match and capture the search word, and print the start position of the 1st capturing group:

Pattern ptrn = Pattern.compile("\\P{Alnum}(test)\\P{Alnum}");
...
System.out.println(matcher.start(1));

See this Java demo

NOTE that in this scenario, the \P{Alnum} is a consuming pattern, and in some edge cases, test might not get matched.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Although I wasn't clear in the question (sorry about that!), this is exactly what I wanted! Thanks a lot! – Maxsteel Nov 13 '16 at 10:06
  • For those who only need alphabetic characters and not alphanumeric, you can use `"(?<!\\p{Alpha})test(?!\\p{Alpha})"` – Dat Nguyen Jul 13 '17 at 03:10
  • 1
    @DatNguyen: Note that `\p{Alpha}` works with ASCII only letters by default (if you do not specify the `Pattern.UNICODE_CHARACTER_CLASS` flag). To match any Unicode letters without depending on flags, use `"(?<!\\p{L})test(?!\\p{L})"`. – Wiktor Stribiżew Jul 13 '17 at 07:18
  • Good to know. Thanks @WiktorStribiżew! – Dat Nguyen Jul 13 '17 at 07:19
0

I am just trying to understand your question. You are looking for test that is surrounded by a special character (_ included) or a space? But yet you say this is#input_ : True. I am not sure if I am picking this up wrong but how is that true in that case?

Anyway I have got the regex [\W\s_](input|test)[\W\s_] that matches all your cases defined as true.

  • \W matches any non words
  • \s matches any whitespace
  • _ matches any underscore - Has to be defined on its own as this is a word
  • From my confusion of the test mentioned I have the regex search for both input and test

Also I user this site anytime I work with Regex as I find it so useful.

Not sure if this is the answer your looking for but let me know if I am wrong and I will try again

Dan
  • 2,020
  • 5
  • 32
  • 55