-1

I need a regex which can match "/ but not "/" or "/>. So for example:

  • "/hello" : match the "/
  • "/cool.php" : match the "/
  • a + "/" + b : do not match
  • <input name="hello"/> : do not match

Using regex builder, I've managed to create a regex "\/[^\>"] which match "/ but plus one character. So "/hello" will match "/h. This is wrong.

I'm a little newbie on regex. What's wrong in my regex? Can you fix it and probably give a little explanation of why it's wrong and what part do you fix and why?

Pranav C Balan
  • 113,687
  • 23
  • 165
  • 188
Chen Li Yong
  • 5,459
  • 8
  • 58
  • 124

2 Answers2

2

Use look positive ahead assertion

"\/(?=[^>"]|$)

Regex explanation here

Pranav C Balan
  • 113,687
  • 23
  • 165
  • 188
2

Use a negative lookahead:

"\/(?!"|>)
Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • 2
    I guess `"\/(?![">])` is a bit more streamlined. – Wiktor Stribiżew Jun 13 '16 at 17:14
  • @WiktorStribiżew: But one character longer! OMG :-) – Bergi Jun 13 '16 at 17:15
  • @WiktorStribiżew no: `"|>` is shorter than `[">]` – Bohemian Jun 13 '16 at 17:15
  • Shorter does not mean more efficient when it comes to regex. – Wiktor Stribiżew Jun 13 '16 at 17:16
  • @WiktorStribiżew efficient does not mean better. When the difference in performance is minuscule (as here), readability wins (for me). – Bohemian Jun 13 '16 at 17:17
  • This is great and gives me some perspective to build a regex with the same behaviour using different approach. :) – Chen Li Yong Jun 13 '16 at 17:17
  • @Bergi No need to escape the slash – Bohemian Jun 13 '16 at 17:17
  • 1
    @Bohemian: Depends on the language where it is used, the slash is a popular literal delimiter (and the OP escaped it as well) – Bergi Jun 13 '16 at 17:19
  • @Bergi Escaping a backslash is technically not part of the regex, and the language would signal a syntax error in the case that it was needed, making it obvious. Many devs unnecessarily escape the backslash in non-backslash delimiting languages.. – Bohemian Jun 13 '16 at 17:21
  • @Bohemian: It depends what you count. I see 1 *branch* inside the lookahead. With the alternation, there are 2 branches. 1 is less than 2. – Wiktor Stribiżew Jun 13 '16 at 17:22
  • @WiktorStribiżew: What is a character class if not syntactical sugar for branching? – Bergi Jun 13 '16 at 17:26
  • Bergi, a character class is compiled as one atom, it is not an alternation group alternative. – Wiktor Stribiżew Jun 13 '16 at 17:36
  • @WiktorStribiżew: Yes, a character class is an atom, but does that mean a regex engine does compile and optimise it any differently? – Bergi Jun 13 '16 at 17:45
  • Yes. [Here](http://stackoverflow.com/a/26141949/3832970) you can check how backtracking works with alternation and character class. Also, Toto's and Tim Pietzcker's answers [here](http://stackoverflow.com/questions/4724588/using-alternation-or-character-class-for-single-character-matching) are worthy of checking. – Wiktor Stribiżew Jun 13 '16 at 18:17
  • @WiktorStribiżew: However Unihedron mentions that the prce engine does optimise this case :-) – Bergi Jun 13 '16 at 19:32
  • Optimizes character classes, yes, but not alternation. – Wiktor Stribiżew Jun 13 '16 at 19:35
  • @WiktorStribiżew: I believe he meant that the engine optimises an alternation of single literal characters *away* into a character class. That's why he needed to disable optimisations for the screenshot. – Bergi Jun 13 '16 at 19:43