1

Consider following expression:

((password|secret)(=|%3D%22))+([^&|\"|%22]*)

And value:

http://host?foo=bar&xml=%3C%3Fxml+id%3D%220abc987%22+password%3D%22secreT12aa5%22+binds%3D%222%22

The xml parameter contains encoded value <?xml id="0abc987" password="secreT12aa5" binds="2"

What I would like to achieve is match password="secreT12aa5" and then replace it with e.g. password="****"

This issue is that the given regular expression matches, only the sequence of string up to 2, this is because of value in a negate set %22. The percentage sign is being ignored.

How can I change the expression to match password%3D%22secreT12aa5 (whole password value?)

The expression should also match http://host?password=value. Which currently does.

enter image description here

I would like to use this regular expression also for replacements. And use replaceAll() method to actually strip a matching parameter value.

Soe the regex ((password)(=|%3D%22))([^&|\\"]*)(%22)? with replacements $1[PROTECTED]$5 automatically replaces:

password=VALUE 
to => 
password=[PROTECTED]

password=VALUE&secret=VALUE 
to => 
password=[PROTECTED]&secret=[PROTECTED]

http://host?foo=bar&xml=%3C%3Fxml+id%3D%220abc987%22+password%3D%22secreT12345%22+binds%3D%222%22 
to => 
http://host?foo=bar&xml=%3C%3Fxml+id%3D%220abc987%22+password%3D%22[PROTECTED]%22+binds%3D%222%22
Peter Jurkovic
  • 2,686
  • 6
  • 36
  • 55

1 Answers1

2

Note that [^&|\"|%22] is a negated character class that matches any char but &, | (yes, a pipe), ", % and 2 since inside the character class all the chars are treated separately, not as sequences.

You may use

password(?:="?|%3D%22)(?:(?!%22)[^&\"])*"?

See the regex demo

Details

  • password - a literal substring
  • (?:="?|%3D%22) - either = followed with an optional " or %3D%22
  • (?:(?!%22)[^&\"])* - any char but & and " ([^&\"]), 0 or more occurrences as many as possible (*), that does not start a %22 char sequence (a so called tempered greedy token).
  • "? - an optional ".

You may re-write the pattern using "unroll-the-loop" principle as

password(?:="?|%3D%22)[^&\"%]*(?:%(?!22)[^%&\"]*)*"?

See another demo.

Also, others would prefer a lazy pattern + lookahead with alternation approach:

password(?:="?|%3D%22)[^&\"]*?(?:(?=%22)|\"|$)

See yet another regex demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    You should add the `"` character since `%22` matches that character. `(?:="|%3D%22)` same with the negation at the end – ctwheels Oct 05 '17 at 14:00
  • @ctwheels I see, just OP left it out of the pattern. Maybe it should be made optional then... – Wiktor Stribiżew Oct 05 '17 at 14:02
  • Sounds good. This can also be used (depending on regex flavour) `password(?:="?|%3D%22)\K(.*?)(?="|%22|$)` – ctwheels Oct 05 '17 at 14:03
  • @ctwheels Since OP is testing at regexr.com that only supports JS regex flavor, I would refrain from assuming the target environement is PHP. Ok, the `pattern="value"` should match, so the `"?` is necessary. – Wiktor Stribiżew Oct 05 '17 at 14:07