Below is custom function Debug-String
, which visualizes control characters in strings:
where available, using PowerShell's own `
-prefixed escape-sequence notation (e.g., `r
for CR), where a native PowerShell escape is available,
falling back to caret notation (e.g., the ASCII-range control character with code point 0x4
- END OF TRANSMISSION - is represented as ^D
).
- Alternatively, you can use the
-CaretNotation
switch to represent all ASCII-range control characters in caret notation, which gives you output similar to cat -A
on Linux and cat -et
on macOS/BSD.
all other control characters, namely those outside the ASCII range (the ASCII range spanning code points 0x0
- 0x7F
) are represented in the form `u{<hex>}
, where <hex>
is the hex. representation of the code point with up to 6 digits; e.g., `u{85}
is Unicode char. U+0085
, the NEXT LINE control char.; this notation is now also supported in expandable strings ("..."
), but only in PowerShell Core.
Applied to your use case, you'd use (requires PSv3+, due to use of Get-Content -Raw
to ensure the file is read as a whole; without it, information about the line endings would be lost):
Get-Content -Raw $file | Debug-String
Two simple examples:
Using PowerShell's escape-sequence notations. Note that this only looks like a no-op: the `-prefixed sequences inside "..." strings create actual control characters.
PS> "a`ab`t c`0d`r`n" | Debug-String
a`ab`t c`0d`r`n
Using -CaretNotation
, with output similar to cat -A
on Linux:
PS> "a`ab`t c`0d`r`n" | Debug-String -CaretNotation
a^Gb^I c^@d^M$
Debug-String
source code:
Note: The function below is also available as an MIT-licensed Gist with additional functionality, notably showing spaces as ·
and the option to show non-ASCII characters as escape sequences (-UnicodeEscapes
), and the option to print a string as a PowerShell string literal (-AsSourceCode
). Only the Gist will be maintained going forward.
Assuming you have looked at the linked code to ensure that it is safe (which I can personally assure you of, but you should always check), you can install it directly as follows:
irm https://gist.github.com/mklement0/7f2f1e13ac9c2afaf0a0906d08b392d1/raw/Debug-String.ps1 | iex
Function Debug-String {
param(
[Parameter(ValueFromPipeline, Mandatory)]
[string] $String
,
[switch] $CaretNotation
)
begin {
# \p{C} matches any Unicode control character, both inside and outside
# the ASCII range; note that tabs (`t) are control character too, but not spaces.
$re = [regex] '\p{C}'
}
process {
$re.Replace($String, {
param($match)
$handled = $False
if (-not $CaretNotation) {
# Translate control chars. that have native PS escape sequences into them.
$handled = $True
switch ([Int16] [char] $match.Value) {
0 { '`0'; break }
7 { '`a'; break }
8 { '`b'; break }
12 { '`f'; break }
10 { '`n'; break }
13 { '`r'; break }
9 { '`t'; break }
11 { '`v'; break }
default { $handled = $false }
} # switch
}
if (-not $handled) {
switch ([Int16] [char] $match.Value) {
10 { '$'; break } # cat -A / cat -e visualizes LFs as '$'
# If it's a control character in the ASCII range,
# use caret notation too (C0 range).
# See https://en.wikipedia.org/wiki/Caret_notation
{ $_ -ge 0 -and $_ -le 31 -or $_ -eq 127 } {
# Caret notation is based on the letter obtained by adding the
# control-character code point to the code point of '@' (64).
'^' + [char] (64 + $_)
break
}
# NON-ASCII control characters; use the - PS Core-only - Unicode
# escape-sequence notation:
default { '`u{{{0}}}' -f ([int16] [char] $_).ToString('x') }
}
} # if (-not $handled)
}) # .Replace
} # process
}
For brevity I haven't included the comment-based help above; here it is:
<#
.SYNOPSIS
Outputs a string in diagnostic form.
.DESCRIPTION
Prints a string with normally hidden control characters visualized.
Common control characters are visualized using PowerShell's own escaping
notation by default, such as
"`t" for a tab, "`n" for a LF, and "`r" for a CR.
Any other control characters in the ASCII range (C0 control characters)
are represented in caret notation (see https://en.wikipedia.org/wiki/Caret_notation).
If you want all ASCII range control characters visualized using caret notation,
except LF visualized as "$", similiar to `cat -A` on Linux, for instance,
use -CaretNotation.
Non-ASCII control characters are visualized by their Unicode code point
in the form `u{<hex>}, where <hex> is the hex. representation of the
code point with up to 6 digits; e.g., `u{85} is U+0085, the NEXT LINE
control char.
.PARAMETER CaretNotation
Causes LF to be visualized as "$" and all other ASCII-range control characters
in caret notation, similar to `cat -A` on Linux.
.EXAMPLE
PS> "a`ab`t c`0d`r`n" | Debug-String
a`ab`t c`0d`r`n
.EXAMPLE
PS> "a`ab`t c`0d`r`n" | Debug-String -CaretNotation
a^Gb^I c^@d^M$
#>