0

I have this regex to obfuscate the password

myString.replaceAll(
    "<.{1,}:Password>.{1,}</.{1,}:Password>",
    "<!--<Password></Password> Removed-->");

When myString contains the following line it successfully obfuscates the password

<abc:Password>myPassword</abc:Password>

But if myString contains the xml without schema prefix i.e

<Password>myPassword</Password> it does not obfuscate the password.

How to extend the existing regex so that it processes both cases?

Kaizar Laxmidhar
  • 859
  • 1
  • 17
  • 38
  • 2
    How about using `([a-z]*:)?` instead of `.{1,}:`? – Thomas Mar 18 '19 at 14:59
  • 5
    You see why regular expressions are not the right tool to process XML? – Henry Mar 18 '19 at 14:59
  • What is the goal here? Do you want to display an obfuscated password, or are you planning to store this password somewhere? In the former case, why not just use `*****` or something like that? – Tim Biegeleisen Mar 18 '19 at 15:00
  • 5
    Expanding on Henry's comment: What if your XML contained multiple `whatever`? What if it was nested? There might be cases where a regex doesn't match 100% (false positives or negatives) so unless you really know what you'll get you should note [that _regular_ expressions are no good fit for _irregular_ languages like XML](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). – Thomas Mar 18 '19 at 15:05
  • @Thomas In my case there will be only one password entry in the entire xml string – Kaizar Laxmidhar Mar 18 '19 at 15:47
  • https://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-reg – VGR Mar 18 '19 at 16:05

1 Answers1

0

Currently you are specifying the range of the number of expected characters, by using curly brackets. ({1,})

Instead, you can either + or * to represent "one or more" (+) or "zero or more" (*) of the previous character. In other words, * is similar to {0,}, while + is the same as {1,}. In addition, you can use ? to represent "zero or one", similar to {0,1}.

Using that knowledge, we can look for character(s) followed by a colon in front of the name of the element (Password), using this pattern:

.+:

We can then put that inside of a non-capturing group, by putting it inside a set of brackets with a ?: in front (to make it non-capturing), and have it look for zero-or-more instances of it.

(?:.+:)?

Thus, your string could change to:

<(?:.+:)?Password>.+</(?:.+:)?Password>
RToyo
  • 2,877
  • 1
  • 15
  • 22
  • Note that `(?:.+:+)*` might kill your regex engine in some cases (see [catastrophic backtracking](https://www.regular-expressions.info/catastrophic.html) or [ReDoS](https://en.wikipedia.org/wiki/ReDoS)) – Thomas Mar 18 '19 at 15:07
  • @Thomas Good point, thank you. The use of `*` was incorrect anyway; it should have been `?`. – RToyo Mar 18 '19 at 15:14