1

In PowerShell (5.1):

Calling an external command (in this case nssm.exe get logstash-service Application) the output is displayed in PowerShell as I would have expected (ASCII-string "M:\logstash-7.1.1\bin\logstash.bat"):

PS C:\> M:\nssm-2.24\win64\nssm.exe get logstash-service Application

M:\logstash-7.1.1\bin\logstash.bat

But the following command (which pipes the output into Out-Default) results in:

PS C:\> M:\nssm-2.24\win64\nssm.exe get logstash-service Application | Out-Default

M : \ l o g s t a s h - 7 . 1 . 1 \ b i n \ l o g s t a s h . b a t
 

(Please note all that "whitespace" separating all characters of the resulting output string)

Also the following attempt to capture the output (as an ASCII string) into variable $outCmd results in :

PS C:\> $outCmd = M:\nssm-2.24\win64\nssm.exe get logstash-service Application

PS C:\> $outCmd

M : \ l o g s t a s h - 7 . 1 . 1 \ b i n \ l o g s t a s h . b a t

 

PS C:\>

Again, please note the separating whitespace between the characters.

Why is there a difference in the output between the first and the latter 2 commands?

Where are the "spaces" (or other kinds of whitespace chars) coming from in the output of the latter 2 commands?

What exactly needs to be done in order to capture the output of that external command as ASCII string "M:\logstash-7.1.1\bin\logstash.bat" (i.e. without the strange spaces in between)?

If the issue is related to ENCODING, please specify what exactly needs to be done/changed.

Shaneis
  • 1,065
  • 1
  • 11
  • 20
Chris
  • 13
  • 2

1 Answers1

2

Yes, the problem is one of character encoding, and the problem often only surfaces when an external program's output is either captured in a variable, sent through the pipeline, or redirected to a file.

Only in these cases does PowerShell get involved and decodes the output into .NET strings before any further processing.

This decoding happens based on the encoding stored in the [Console]::OutputEncoding property, so for programs that do not themselves respect this encoding for their output you'll have to set this property to match the actual character encoding used.

Your symptom implies that nssm.exe outputs UTF-16LE-encoded ("Unicode") strings[1], so to capture them properly you'll have to do something like the following:

$orig = [Console]::OutputEncoding
[Console]::OutputEncoding = [System.Text.Encoding]::Unicode

# Store the output lines from nssm.exe in an array of strings.
$output = M:\nssm-2.24\win64\nssm.exe get logstash-service Application

[Console]::OutputEncoding = $orig

The underlying problem is that external programs are expected to use the current console's output code page for their output encoding, which defaults to the system's active legacy OEM code page, as reflected in [Console]::OutputEncoding (and reported by chcp), but some do not, in an attempt to:

  • either: overcome the limitations of the legacy, single-byte OEM encodings in order to provide full Unicode support (as is the case here, although it is more common to do that with UTF-8 encoding, as the Node.js CLI, node.exe does, for instance)
  • or: use the more widely used active ANSI legacy code page instead (as python does by default).

See this answer for additional information, which also links to two helper functions:
Invoke-WithEncoding, which wraps capturing output from an external program with a given encoding (see example below), and Debug-NativeInOutput, for diagnosing what encoding a given external program uses.

With function Invoke-WithEncoding from the linked answer defined, you could then call:

$output = Invoke-WithEncoding -Encoding Unicode { 
            M:\nssm-2.24\win64\nssm.exe get logstash-service Application 
          }

[1] The apparent spaces in the output are actually NUL characters (code point 0x0) that stem from the 0x0 high bytes of 8-bit-range UTF-16LE code units, which includes all ASCII characters and most of Windows-1252): Because PowerShell, based on the single-byte OEM code page stored in [Console]::Encoding (e.g., 437 on US-English systems), interprets each byte as a whole character, the 0x0 bytes of 2-byte (16-bit) Unicode code units (in the 8-bit range) are mistakenly retained as NUL characters, and in the console these characters present like spaces.

mklement0
  • 382,024
  • 64
  • 607
  • 775