In Windows PowerShell v5.1, ConvertTo-Json
indeed unexpectedly encodes &
characters as Unicode escape sequence \u0026
, where 0026
represents hex. number 0x26
, the Unicode code point representing the &
character, U+0026
.
(PowerShell Core, by contrast, preserves the &
as-is.)
That said, JSON parsers should be able to interpret such escape sequences and, indeed, the complementary ConvertFrom-Json
cmdlet is.
- Note: The solutions below are general ones that can handle the Unicode escape sequences of any Unicode character; since
ConvertTo-Json
seemingly only uses these Unicode escape-sequence representations for the characters &
, '
, <
and >
, a simpler solution is possible, unless false positives must be ruled out - see this answer.
That said, if you do want to manually convert Unicode escape sequences into their character equivalents in JSON text, you can use the following - limited solution:
# Sample JSON with Unicode escapes.
$json = '{ "roleFullPath": "Applications\\User Admin \u0026 Support-DEMO" }'
# Replace Unicode escapes with the chars. they represent,
# with limitations.
[regex]::replace($json, '\\u[0-9a-fA-F]{4}', {
param($match) [char] [int] ('0x' + $match.Value.Substring(2))
})
The above yields:
{ "roleFullPath": "Applications\\User Admin & Support-DEMO" }
Note how \u0026
was converted to the char. it represents, &
.
A robust solution requires more work:
There are characters that must be escaped in JSON and cannot be represented literally, so in order for the to-character conversion to work generically, these characters must be excluded.
Additionally, false positives must be avoided; e.g., \\u0026
is not a valid Unicode escape sequence, because a JSON parser interprets \\
as an escaped \
followed by verbatim u0026
.
Finally, the Unicode sequences for "
and \
must be translated into their escaped forms, \"
and \\
, and it is possible to represent a few ASCII-range control characters by C-style escape sequences, e.g., \t
for a tab character (\u0009
).
The following robust solution addresses all these issues:
# Sample JSON with Unicode escape sequences:
# \u0026 is &, which CAN be converted to the literal char.
# \u000a is a newline (LF) character, which CANNOT be converted, but can
# be translated to escape sequence "\n"
# \\u0026 is *not* a Unicode escape sequence and must be preserved as-is.
$json = '{
"roleFullPath": "Applications\u000aUser Admin \u0026 Support-DEMO-\\u0026"
}'
[regex]::replace($json, '(?<=(?:^|[^\\])(?:\\\\)*)\\u([0-9a-fA-F]{4})', {
param($match)
$codePoint = [int] ('0x' + $match.Groups[1].Value)
if ($codePoint -in 0x22, 0x5c) {
# " or \ must be \-escaped.
'\' + [char] $codePoint
}
elseif ($codePoint -in 0x8, 0x9, 0xa, 0xc, 0xd) {
# Control chars. that can be represented as short, C-style escape sequences.
('\b', '\t', '\n', $null, '\f', '\r')[$codePoint - 0x8]
}
elseif ($codePoint -le 0x1f -or [char]::IsSurrogate([char] $codePoint)) {
# Other control chars. and halves of surrogate pairs must be retained
# as escape sequences.
# (Converting surrogate pairs to a single char. would require much more effort.)
$match.Value
}
else {
# Translate to literal char.
[char] $codePoint
}
})
Output:
{
"roleFullPath": "Applications\nUser Admin & Support-DEMO-\\u0026"
}