1

I've followed the excellent solution in this article:

PowerShell multiple string replacement efficiency

to try and normalize telephone numbers imported from Active Directory. Here is an example:

$telephoneNumbers = @(
        '+61 2 90237534',
        '04 2356 3713'
        '(02) 4275 7954'
        '61 (0) 3 9635 7899'
        '+65 6535 1943'
        )

# Build hashtable of search and replace values.
$replacements = @{
  ' ' = ''
  '(0)' = ''
  '+61' = '0'
  '(02)' = '02'
  '+65' = '001165'
  '61 (0)' = '0'
}

# Join all (escaped) keys from the hashtable into one regular expression.
[regex]$r = @($replacements.Keys | foreach { [regex]::Escape( $_ ) }) -join '|'

[scriptblock]$matchEval = { param( [Text.RegularExpressions.Match]$matchInfo )
  # Return replacement value for each matched value.
  $matchedValue = $matchInfo.Groups[0].Value
  $replacements[$matchedValue]
}


# Perform replace over every line in the file and append to log.
$telephoneNumbers |
  foreach {$r.Replace($_,$matchEval)}

I'm having problems with the formatting of the match expressions in the $replacements hashtable. For example, I would like to match all +61 numbers and replace with 0, and match all other + numbers and replace with 0011.

I've tried the following regular expressions but they don't seem to match:

'^+61'

'^+[^61]'

What am I doing wrong? I've tried using \ as an escape character.

Community
  • 1
  • 1
Jagged
  • 11
  • 2
  • Thanks for the response. Yes the above code works, but would require me to add a hashtable entry for all country codes. Ideally I would like to match +61 and replace with 0, then match NOT +61 and replace with 0011[country code]. – Jagged Oct 22 '15 at 04:55
  • Hmm; what you're doing wrong is the character class `[]` matches individual characters, `not (six or one)` and that the `[regex]::EscapeString()` call is turning everything into string literals - so no advanced regex commands work, instead it's looking for literally open-square-bracket-chevron-six-one-close-square-bracket. The pattern you want is a negative lookahead - `\+(?!61)` - plus not followed by a 61 (not-capturing). But you can't run that through EscapeString either. And if you take EscapeString away and escape things by hand in the hashtable - the replacement code can't match it up... – TessellatingHeckler Oct 22 '15 at 05:04
  • I can't think of a nice way around it - maybe others can, but I'd be tempted to do that approach for all the direct swap replacements, and do a separate `-replace '(\+(?!61))','0011'` on each number as well. Unless you need to do this kind of pattern a lot, then you'll have to rework the whole thing, I think. – TessellatingHeckler Oct 22 '15 at 05:15
  • Thanks for the feedback. Looks like this is going to be too much trouble to change. I need to run each match against about 12,000 entries. – Jagged Oct 22 '15 at 05:26
  • How many search/replace patterns are there going to be? in PowerShell 3+ you can do `$telephoneNumbers = $telephoneNumbers -replace 'a','b'` and it will work on all the strings in the array. On my system, that over a random 12,000 numbers takes ~16 mS, where loop version takes ~288mS - and I didn't test a match group lookup in the loop. You could do 15+ of these style replaces in the time it takes to do the loop version, and the code would be simpler and cleaner: `$tn = $tn -replace '\(0\)',''; $tn = $tn -replace '\+61','0'; ...` – TessellatingHeckler Oct 23 '15 at 01:28

1 Answers1

2

I've done some re-arrangement of this, I'm not sure if it works for your whole situation but it gives the right results for the example.

I think the key is not to try and create one big regex from the hashtable, but rather to loop over it and check the values in it against the telephone numbers.

The only other change I made was moving the ' ','' replacement from the hash into the code that prints the replacement phone number, as you want this to run in every scenario.

Code is below:

$telephoneNumbers = @(
  '+61 2 90237534',
  '04 2356 3713'
  '(02) 4275 7954'
  '61 (0) 3 9635 7899'
  '+65 6535 1943'
)

$replacements = @{
  '(0)' = ''
  '+61' = '0'
  '(02)' = '02'
  '+65' = '001165'
}

foreach ($t in $telephoneNumbers) {
  $m = $false
  foreach($r in $replacements.getEnumerator()) {
    if ( $t -match [regex]::Escape($r.key) ) {
      $m = $true
      $t -replace [regex]::Escape($r.key), $r.value -replace ' ', '' | write-output
    } 
  }
  if (!$m) { $t -replace ' ', '' | write-output } 
}

Gives:

0290237534
0423563713
0242757954
61396357899
00116565351943
arco444
  • 22,002
  • 12
  • 63
  • 67