2

Here's a simple example:

Text: <input name="zzz" value="18754" type="hidden"><input name="zzz" value="18311" type="hidden"><input name="zzz" value="17138" type="hidden">

Regex: /<input.*?value="(18754|17138)".*?>/

When matches are replaced by an empty string, the result is an empty string. I expected the middle <input> to remain since I am using non-greedy matching (.*?). Anyone could explain why it is removed?

Ree
  • 6,061
  • 11
  • 48
  • 50

3 Answers3

5

There are two matches:

  1. <input name="zzz" value="18754" type="hidden">
  2. <input name="zzz" value="18311" type="hidden"><input name="zzz" value="17138" type="hidden">

In the second case, the first .*? matches name="zzz" value="18311" type="hidden"><input name="zzz". It's a match and it's non-greedy.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
2

aix already explained, why it does match the middle part.

To avoid this behaviour, get rid of the .*?, instead try this:

/<input[^>]*value="(18754|17138)"[^>]*>/

See it here on Regexr

Instead of matching any character, match any, but ">"

stema
  • 90,351
  • 20
  • 107
  • 135
0

aiz's answer is correct -- the second match includes the 2nd and 3rd input tags.

One possible fix for your regex would be to change . to [^>], like this:

/<input[^>]*?value="(18754|17138)"[^>]*?>/

That will cause it to match any character except >. But that has the obvious problem of breaking whenever > shows up inside a quoted literal. As everyone always says: Regexes aren't designed to work on HTML. Don't use them unless you have no other choice.

Community
  • 1
  • 1
ean5533
  • 8,884
  • 3
  • 40
  • 64