2

Using REGEX to find patterns in a capture group; now I need to replace/redact the values found.

trying to replace values in a fixed length field:
REGEX to search: (\d{10})(.{20}) (.+).

The string is:

01234567890Alice Stone          3978 Smith st...

I have to replace capture group 2 (full name) with X's (or better yet just the first and last name in the capture group 2)

Regex: (\d{10})(.{20})(.+)

replace value $1xxxxxxxxxxxxxxxxxxxx$3

This works, but thought there would be a more glamorous solution (Maybe like $1 x{20} $3) or even better somehow just redact values with letters in it.

Thanks!

Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
Barbara
  • 25
  • 3

3 Answers3

2

In order to formulate a replacement string whose length should match a - potentially variable-length - substring of the input string, you need to calculate the replacement string dynamically, via a script block (delegate).

In PowerShell Core you can now pass a script block directly as the -replace operator's replacement operand:

PS> '01234567890Alice Stone          3978 Smith st...' -replace 
      '(?<=^\d{10}).{20}', { 'x' * $_.Value.Length }

0123456789xxxxxxxxxxxxxxxxxxxx  3978 Smith st...
  • '(?<=^\d{10} is a positive look-behind assertion that matches the first 10 digits without capturing them, and .{20} matches and captures the next 20 characters.

  • The script block is called for each match with $_ containing the match at hand as a [System.Text.RegularExpressions.Match] instance; .Value contains the matched text.

  • Thus, 'x' * $_.Value.Length returns a string of x chars. of the same length as the match.


In Windows PowerShell you have to use the [regex] type directly:

PS> [regex]::Replace('01234567890Alice Stone          3978 Smith st...',
      '(?<=^\d{10}).{20}', { param($m) 'x' * $m.Value.Length })

0123456789xxxxxxxxxxxxxxxxxxxx  3978 Smith st...

If the length of the substring to replace is known in advance - as in your case - you could more simply do:


PS> $len = 20; '01234567890Alice Stone          3978 Smith st...' -replace 
      "(?<=^\d{10}).{$len}", ('x' * $len)

0123456789xxxxxxxxxxxxxxxxxxxx  3978 Smith st...

Unconditionally redacting all letters is even simpler:

PS> '01234567890Alice Stone          3978 Smith st...' -replace '\p{L}', 'x'

01234567890xxxxx xxxxx          3978 xxxxx xx...

\p{L} matches any Unicode letter.


Redacting the letters only in the matching substring requires nesting a -replace operation:

PS> '01234567890Alice Stone          3978 Smith st...' -replace 
      '(?<=^\d{10}).{20}', { $_ -replace '\p{L}', 'x' }

01234567890xxxxx xxxxx          3978 Smith st...
mklement0
  • 382,024
  • 64
  • 607
  • 775
1

Maybe, this expression would be an option:

([0-9]{11}).+?(\s*[0-9].+)

and the replacement would be:

$1xxxxxxxxxxxxxxxxxxxx$2

If you wish to explore/simplify/modify the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


Emma
  • 27,428
  • 11
  • 44
  • 69
0

You can use this:

$oldstr = "0123456789Alice Stone 3978 Smith st..."
[regex]$r = '(\d{10})(.{20})(.+)'

$newstr = $r.Replace($data,'$1'+'x'*20+'$3')

Here, the 'x' character is multiplied by 20 (effectively repeated 20 times).

0123456789xxxxxxxxxxxxxxxxxxxxth st...

As others have shown, capture group 2 isn't required, so it can be simplified to:

[regex]$r = '(\d{10}).{20}(.+)'

$newstr = $r.Replace($data,'$1'+'x'*20+'$2')
mjsqu
  • 5,151
  • 1
  • 17
  • 21