Context: Azure, Windows Server 2012, PowerShell 5
I've got the following code to convert all control characters (ascii and unicode whitespace other than \x20 itself) to their ampersand-hash equivalents.
function ConvertTo-AmpersandHash {
param ([Parameter(Mandatory)][String]$Value)
# there's got to be a better way of doing this.
$AMPERHASH = '&#'
$SEMICOLON = ';'
for ($i = 0x0; $i -lt 0x20; $i++) {
$value = $value -replace [char]$i,($AMPERHASH + $i + $SEMICOLON)
}
for ($i = 0x7f; $i -le 0xa0; $i++) {
$value = $value -replace [char]$i,($AMPERHASH + $i + $SEMICOLON)
}
return $Value
}
As can be seen by the embedded comment, I'm sure there's a better way to do this. As it stands, one does some 65 iterations for each incoming string. Would regular expressions work better/faster?
LATER
-replace '([\x00-\x1f\x7f-\xa0])',('&#' + [byte][char]$1 + ';')
looks promising but the $1 is evaluating to zero all the time, giving me �
all the time.
LATER STILL
Thinking that -replace
couldn't internally iterate, I came up with
$t = [char]0 + [char]1 + [char]2 + [char]3 + [char]4 + [char]5 + [char]6
$r = '([\x00-\x1f\x7f-\xa0])'
while ($t -match [regex]$r) {
$t = $t -replace [regex]$r, ('&#' + [byte][char]$1 + ';')
}
echo $t
However out of that I still get
�������
FINALLY
function ConvertTo-AmpersandHash {
param ([Parameter(Mandatory)][String]$Value)
$AMPERHASH = '&#'
$SEMICOLON = ';'
$patt = '([\x00-\x1f\x7f-\xa0]{1})'
while ($Value -match [regex]$patt) {
$Value = $Value -replace $Matches[0], ($AMPERHASH + [byte][char]$Matches[0] + $SEMICOLON)
}
return $Value
}
That works better. Faster too. Any advances on that?