36

I'm trying to craft two regular expressions that will match URIs. These URIs are of the format: /foo/someVariableData and /foo/someVariableData/bar/someOtherVariableData

I need two regexes. Each needs to match one but not the other.

The regexes I originally came up with are: /foo/.+ and /foo/.+/bar/.+ respectively.

I think the second regex is fine. It will only match the second string. The first regex, however, matches both. So, I started playing around (for the first time) with negative lookahead. I designed the regex /foo/.+(?!bar) and set up the following code to test it

public static void main(String[] args) {
    String shouldWork = "/foo/abc123doremi";
    String shouldntWork = "/foo/abc123doremi/bar/def456fasola";
    String regex = "/foo/.+(?!bar)";
    System.out.println("ShouldWork: " + shouldWork.matches(regex));
    System.out.println("ShouldntWork: " + shouldntWork.matches(regex));
}

And, of course, both of them resolve to true.

Anybody know what I'm doing wrong? I don't need to use Negative lookahead necessarily, I just need to solve the problem, and I think that negative lookahead might be one way to do it.

Thanks,

james.garriss
  • 12,959
  • 7
  • 83
  • 96
Cody S
  • 4,744
  • 8
  • 33
  • 64

1 Answers1

64

Try

String regex = "/foo/(?!.*bar).+";

or possibly

String regex = "/foo/(?!.*\\bbar\\b).+";

to avoid failures on paths like /foo/baz/crowbars which I assume you do want that regex to match.

Explanation: (without the double backslashes required by Java strings)

/foo/ # Match "/foo/"
(?!   # Assert that it's impossible to match the following regex here:
 .*   #   any number of characters
 \b   #   followed by a word boundary
 bar  #   followed by "bar"
 \b   #   followed by a word boundary.
)     # End of lookahead assertion
.+    # Match one or more characters

\b, the "word boundary anchor", matches the empty space between an alphanumeric character and a non-alphanumeric character (or between the start/end of the string and an alnum character). Therefore, it matches before the b or after the r in "bar", but it fails to match between w and b in "crowbar".

Protip: Take a look at http://www.regular-expressions.info - a great regex tutorial.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • The `.*` **inside** the negative lookahead expression: `(?!.*bar)` is key here, rather than outside: `.*(?!bar)`. Thanks. – Gary Oct 24 '18 at 14:57