1

Context: Azure, Windows Server 2012, PowerShell 5

I've got the following code to convert all control characters (ascii and unicode whitespace other than \x20 itself) to their ampersand-hash equivalents.

function ConvertTo-AmpersandHash {
  param ([Parameter(Mandatory)][String]$Value)
  # there's got to be a better way of doing this. 
  $AMPERHASH = '&#'
  $SEMICOLON = ';'
  for ($i = 0x0; $i -lt 0x20; $i++) { 
    $value = $value -replace [char]$i,($AMPERHASH + $i + $SEMICOLON) 
  }
  for ($i = 0x7f; $i -le 0xa0; $i++) { 
    $value = $value -replace [char]$i,($AMPERHASH + $i + $SEMICOLON) 
  }
  return $Value 
}

As can be seen by the embedded comment, I'm sure there's a better way to do this. As it stands, one does some 65 iterations for each incoming string. Would regular expressions work better/faster?

LATER

 -replace '([\x00-\x1f\x7f-\xa0])',('&#' + [byte][char]$1  + ';')

looks promising but the $1 is evaluating to zero all the time, giving me � all the time.

LATER STILL

Thinking that -replace couldn't internally iterate, I came up with

$t = [char]0 + [char]1 + [char]2 + [char]3 + [char]4 + [char]5 + [char]6
$r = '([\x00-\x1f\x7f-\xa0])'
while ($t -match [regex]$r) {
  $t = $t -replace [regex]$r, ('&#' + [byte][char]$1  + ';')
}
echo $t

However out of that I still get

�������

FINALLY

function ConvertTo-AmpersandHash {
  param ([Parameter(Mandatory)][String]$Value)
  $AMPERHASH = '&#'
  $SEMICOLON = ';'
  $patt = '([\x00-\x1f\x7f-\xa0]{1})'
  while ($Value -match [regex]$patt) {
    $Value = $Value -replace $Matches[0], ($AMPERHASH + [byte][char]$Matches[0]  + $SEMICOLON)
  }
  return $Value 
}

That works better. Faster too. Any advances on that?

bugmagnet
  • 7,631
  • 8
  • 69
  • 131

3 Answers3

2

Your question is a little unclear to me, and could be a duplicate of What is the best way to escape HTML-specific characters in a string (PowerShell)?.

It would be nice if you explicitly stated the exact string you have and what you want it to converted to. One has to read the code to try to guess.

I am guessing one or more of these functions will do what you want:

$a = "http://foo.org/bar?baz & also <value> conversion"
"a"
$a

$b = [uri]::EscapeDataString($a)
"b"
$b
$c = [uri]::UnescapeDataString($b)
"c"
$c

Add-Type -AssemblyName System.Web
$d = [System.Web.HttpUtility]::HtmlEncode($a)
"d"
$d
$e = [System.Web.HttpUtility]::HtmlDecode($d)
"e"
$e

Gives:

a
http://foo.org/bar?baz & also <value> conversion
b
http%3A%2F%2Ffoo.org%2Fbar%3Fbaz%20%26%20also%20%3Cvalue%3E%20conversion
c
http://foo.org/bar?baz & also <value> conversion
d
http://foo.org/bar?baz &amp; also &lt;value&gt; conversion
e
http://foo.org/bar?baz & also <value> conversion
Community
  • 1
  • 1
Kory Gill
  • 6,993
  • 1
  • 25
  • 33
  • Fair enough. I was wanting to escape whitespace characters other than \x20 (and I should edit my regex above to say \x00-\x1f). I also wanted to escape unicode whitespace characters. – bugmagnet Dec 09 '16 at 02:45
2

Kory Gill's answer with the library call is surely a better approach, but to address your regex question, you can't evaluate code in the replacement with the -replace operator.

To do that, you need to use the .Net regex replace method, and pass it a scriptblock to evaluate the replacement, which takes a parameter of the match. e.g.

PS C:\> [regex]::Replace([string][char]2,
                         '([\x00-\x20\x7f-\xa0])',
                         {param([string]$m) '&#' + [byte][char]$m + ';'})
&#2;
mklement0
  • 382,024
  • 64
  • 607
  • 775
TessellatingHeckler
  • 27,511
  • 4
  • 48
  • 87
1

I have one small function which helps me replacing as per my requirement:

$SpecChars are all the characters that are going to be replaced with nothing

Function Convert-ToFriendlyName

{param ($Text)

# Unwanted characters (includes spaces and '-') converted to a regex:

$SpecChars =  '\', ' ','\\'

$remspecchars = [string]::join('|', ($SpecChars | % {[regex]::escape($_)}))

# Convert the text given to correct naming format (Uppercase)

$name = (Get-Culture).textinfo.totitlecase(“$Text”.tolower())

# Remove unwanted characters

$name = $name -replace $remspecchars, ""

$name

}

Example: Convert-ToFriendlyName "My\Name\isRana\Dip " will result me "MyNameIsranaDip".

Hope it helps you.

Ranadip Dutta
  • 8,857
  • 3
  • 29
  • 45