2

When using angle brackets in Java regex, what is difference between "\\<" and just "<"?

When I tested, the result was same(or I might miss some cases).

If they are same, why people add "\\" redundantly? Like this or this?

Note: I'm not asking how I can remove HTML from string, so please don't recommend me like JSoup or JTidy.

Community
  • 1
  • 1
Sanghyun Lee
  • 21,644
  • 19
  • 100
  • 126
  • [Even Jon Skeet cannot parse HTML using regular expressions.](http://stackoverflow.com/a/1732454/963076) – ryvantage Dec 27 '13 at 05:16

1 Answers1

5

The angle-bracket characters can technically be used for lookahead and lookbehind captures, so it can make sense to quote them defensively if the pattern is adding any segments that are provided at runtime.

chrylis -cautiouslyoptimistic-
  • 75,269
  • 21
  • 115
  • 152
  • I don't see how this could be an issue. From your link (Java API) it looks like the only times when `<` is a special character are `\k`, `(?X)` and lookbehind (`(?<=X)` and `(?<!X)`). This is the only valid use of `\k`, so no ambiguity there. Any character after `(?` must be special, so no one is going to write `Pattern.compile("(?" + myString + ")")`. Perhaps some people think it's easier to use `\\<` than make absolutely sure it's unnecessary. I know I've done this in other situations. – David Knipe Dec 27 '13 at 13:24
  • @DavidKnipe Have you heard of injection attacks? What if `myString` is what contains the magic opening bit? – chrylis -cautiouslyoptimistic- Dec 27 '13 at 18:45
  • Yes I do. And I realise that `Pattern.compile("(?" + myString + ")")` is dodgy. But my point is that it's a piece of code that no one would want to use anyway, even ignoring security. To be any use, the first character of `myString` would have to be a special character. The writer of that snippet would be expecting `myString` to begin with `<`, or `=` (lookahead), or `:` (non-capturing), etc. You wouldn't want a regex with such diverse behaviour, and you wouldn't want the behaviour to be specified by the first character of `myString` in this way. – David Knipe Dec 27 '13 at 20:31