2

RoleFullPath

Applications\User Admin & Support-DEMO

PowerShell Code

$NewJSON.roleFullPath = $Line.RoleFullPath
.
.
.
.
$JSONPath = $RolePath + $FolderName  + "-JSON.json"   
Convertto-JSON $NewJSON | Out-file -Encoding "UTF8" $JSONPath

Output:

"roleFullPath":  "Applications\\User Admin \u0026 Support-DEMO"

While converting from csv to json, character '&' is getting converted to '\u0026'

Any help?

mklement0
  • 382,024
  • 64
  • 607
  • 775
RDN
  • 35
  • 1
  • 8

2 Answers2

3

In Windows PowerShell v5.1, ConvertTo-Json indeed unexpectedly encodes & characters as Unicode escape sequence \u0026, where 0026 represents hex. number 0x26, the Unicode code point representing the & character, U+0026.
(PowerShell Core, by contrast, preserves the & as-is.)

That said, JSON parsers should be able to interpret such escape sequences and, indeed, the complementary ConvertFrom-Json cmdlet is.

  • Note: The solutions below are general ones that can handle the Unicode escape sequences of any Unicode character; since ConvertTo-Json seemingly only uses these Unicode escape-sequence representations for the characters &, ', < and >, a simpler solution is possible, unless false positives must be ruled out - see this answer.

That said, if you do want to manually convert Unicode escape sequences into their character equivalents in JSON text, you can use the following - limited solution:

# Sample JSON with Unicode escapes.
$json = '{ "roleFullPath":  "Applications\\User Admin \u0026 Support-DEMO" }'

# Replace Unicode escapes with the chars. they represent,
# with limitations.
[regex]::replace($json, '\\u[0-9a-fA-F]{4}', { 
  param($match) [char] [int] ('0x' + $match.Value.Substring(2)) 
})

The above yields:

{ "roleFullPath": "Applications\\User Admin & Support-DEMO" }

Note how \u0026 was converted to the char. it represents, &.

A robust solution requires more work:

  • There are characters that must be escaped in JSON and cannot be represented literally, so in order for the to-character conversion to work generically, these characters must be excluded.

  • Additionally, false positives must be avoided; e.g., \\u0026 is not a valid Unicode escape sequence, because a JSON parser interprets \\ as an escaped \ followed by verbatim u0026.

  • Finally, the Unicode sequences for " and \ must be translated into their escaped forms, \" and \\, and it is possible to represent a few ASCII-range control characters by C-style escape sequences, e.g., \t for a tab character (\u0009).

The following robust solution addresses all these issues:

# Sample JSON with Unicode escape sequences:
#  \u0026 is &, which CAN be converted to the literal char.
#  \u000a is a newline (LF) character, which CANNOT be converted, but can
#  be translated to escape sequence "\n"
# \\u0026 is *not* a Unicode escape sequence and must be preserved as-is.
$json = '{ 
  "roleFullPath": "Applications\u000aUser Admin \u0026 Support-DEMO-\\u0026" 
}'

[regex]::replace($json, '(?<=(?:^|[^\\])(?:\\\\)*)\\u([0-9a-fA-F]{4})', {
  param($match)
  $codePoint = [int] ('0x' + $match.Groups[1].Value)
  if ($codePoint -in 0x22, 0x5c) { 
    # " or \ must be \-escaped.
    '\' + [char] $codePoint
  } 
  elseif ($codePoint -in 0x8, 0x9, 0xa, 0xc, 0xd) { 
    # Control chars. that can be represented as short, C-style escape sequences.
    ('\b', '\t', '\n', $null, '\f', '\r')[$codePoint - 0x8]
  }
  elseif ($codePoint -le 0x1f -or [char]::IsSurrogate([char] $codePoint)) {
    # Other control chars. and halves of surrogate pairs must be retained
    # as escape sequences.
    # (Converting surrogate pairs to a single char. would require much more effort.)
    $match.Value
  }
  else {
    # Translate to literal char.
    [char] $codePoint
  }
})

Output:

{ 
  "roleFullPath": "Applications\nUser Admin & Support-DEMO-\\u0026" 
}
mklement0
  • 382,024
  • 64
  • 607
  • 775
0

To stop Powershell from doing this pipe your Json output through this

$jsonOutput | ForEach-Object { [System.Text.RegularExpressions.Regex]::Unescape($_) } | Set-Content $jsonPath -Encoding UTF8;

This will prevent the & being converted :)

James
  • 2,516
  • 2
  • 19
  • 31
  • While `[regex]::Unescape()` conveniently converts Unicode escape sequences into the actual characters they represent, applying this method to JSON text will _invalidate it_ as JSON, if it contains other instances of `\ ` characters, such as the `\\ `sequence in the question. – mklement0 Sep 27 '21 at 14:09