2

I'm working on a powershell script in which several commands output are shown in the window and appended to a file or a variable. It worked correctly until I used the sfc command. When piped or redirected, the output is "broken":

> sfc /?
Vérificateur de ressources Microsoft (R) Windows (R) version 6.0[...]

> sfc /? | Tee-Object -Variable content
 V Ú r i f i c a t e u r   d e   r e s s o u r c e s   M i c r o s o f t   ( R )   W i n d o w s   ( R )   v e r s i o  á 6 . 0[...]

Are there other commands like sfc that are formatted in the same way, or that will result in a broken output if redirected?


EDIT

Powershell sample code, using the code from the accepted answer:

# Run a command
function RunCommand([ScriptBlock] $command) {

    # Run the command and write the output to the window and to a variable ("SFC" formatting)
    $stringcommand = $command.ToString()
    if (
        $stringcommand -match "^SFC$" -or
        $stringcommand -match "^SFC.exe$" -or
        $stringcommand -match "^SFC .*$" -or
        $stringcommand -match "^SFC.exe .*$"
    ) {
        $oldEncoding = [console]::OutputEncoding
        [console]::OutputEncoding = [Text.Encoding]::Unicode
        $command = [ScriptBlock]::Create("(" + $stringcommand + ")" + " -join ""`r`n"" -replace ""`r`n`r`n"", ""`r`n""")
        & ($command) 2>&1 | Tee-Object -Variable out_content
        [console]::OutputEncoding = $oldEncoding

    # Run the command and write the output to the window and to a variable (normal formatting)
    } else {
        & ($command) 2>&1 | Tee-Object -Variable out_content
    }

    # Manipulate output variable, write it to a file...
    # ...
    return
}

# Run commands
RunCommand {ping 127.0.0.1}
RunCommand {sfc /?}
[void][System.Console]::ReadKey($true)
exit

CMD sample code, using more to format the sfcoutput:

@echo off
setlocal enabledelayedexpansion
set "tmpfile=%TEMP%\temp.txt"
set "outputfile=%TEMP%\output.txt"

REM; Run commands
call :RunCommand "ping 127.0.0.1"
call :RunCommand "sfc"
pause
exit /b

REM; Run a command
:RunCommand

    REM; Run the command and write the output to the window and to the temp file
    set "command=%~1"
    (!command! 2>&1) >!tmpfile!

    REM; Write the output to the window and to the output file ("SFC" formatting)
    set "isSFC=0"
    (echo !command!|findstr /I /R /C:"^SFC$" > NUL) && (set "isSFC=1")
    (echo !command!|findstr /I /R /C:"^SFC.exe$" > NUL) && (set "isSFC=1")
    (echo !command!|findstr /I /R /C:"^SFC .*$" > NUL) && (set "isSFC=1")
    (echo !command!|findstr /I /R /C:"^SFC.exe .*$" > NUL) && (set "isSFC=1")
    (if !isSFC! equ 1 (
        (set \n=^
%=newline=%
)
        set "content="
        (for /f "usebackq tokens=* delims=" %%a in (`more /p ^<"!tmpfile!"`) do (
            set "line=%%a"
            set "content=!content!!line!!\n!"
        ))
        echo.!content!
        (echo.!content!) >>!outputfile!

    REM; Write the output to the window and to the locked output file (normal formatting)
    ) else (
        type "!tmpfile!"
        (type "!tmpfile!") >>!outputfile!
    ))
goto :EOF
Deaudouce
  • 167
  • 2
  • 12
  • that LOOKS like a character encoding problem. what happens if you simply assign the output of `sfc` to a $Var like so >>> `$SFC_Outout = sfc` <<< and then use the `$SFC_Output` variable? – Lee_Dailey Sep 01 '19 at 22:46
  • @Lee_Dailey Thanks for the suggestion, but the variable content is also broken. – Deaudouce Sep 01 '19 at 23:26
  • you will need to find some way to deal with that. i don't know how to do so, tho. it is decidedly unexpected that the $Var assignment would also have the problem ... but not the direct to screen display. [*sigh ...*] it does appear to be a locale/encoding issue ... but i have no clue on how to proceed. – Lee_Dailey Sep 01 '19 at 23:47
  • @Lee_Dailey: If no capturing by PowerShell is involved (variable assignment, routing through `Tee-Object`), `sfc` writes _directly to the console_, so the problem doesn't occur; see the footnote in my answer for details. – mklement0 Sep 02 '19 at 13:31

2 Answers2

4

As noted in js2010's answer, the sfc.exe utility - surprisingly - outputs text that is UTF-16LE ("Unicode") encoded.

Since PowerShell doesn't expect that, it misinterprets sfc's output.[1]

The solution is to (temporarily) change [console]::OutputEncoding to UTF-16LE, which tells PowerShell / .NET what character encoding to expect from external programs, i.e., how to decode external-program output to .NET strings (which are stored as UTF-16 code units in memory).

However, there's an additional problem that looks like a bug: bizarrely, sfc.exe uses CRCRLF (`r`r`n) sequences as line breaks rather than the Windows-customary CRLF (`r`n) newlines.

PowerShell, when it captures stdout output from external programs, returns an array of lines rather than a single multi-line string, and it treats the following newline styles interchangeably: CRLF (Windows-style), LF (Unix-style), and CR (obsolete Mac-style - very rare these days).
Therefore, it treats CRCRLF as two newlines, which are reflected in both "teed" and captured-in-a-variable output then containing extra, empty lines.
The solution is therefore to join the array elements with the standard CRLF newline sequences - (sfc /?) -join "`r`n" and then replace 2 consecutive `r`n with just one, to remove the artificially introduced line breaks: -replace "`r`n`r`n", "`r`n".

To put it all together:

# Save the current output encoding and switch to UTF-16LE
$prev = [console]::OutputEncoding
[console]::OutputEncoding = [Text.Encoding]::Unicode

# Invoke sfc.exe, whose output is now correctly interpreted and
# apply the CRCRLF workaround.
# You can also send output to a file, but note that Windows PowerShell's 
# > redirection again uses UTF-16LE encoding.
# Best to use ... | Set-Content/Add-Content -Encoding ... 
(sfc /?) -join "`r`n" -replace "`r`n`r`n", "`r`n" | Tee-Object -Variable content

# Restore the previous output encoding, which is the system's 
# active OEM code page, which should work for other programs such
# as ping.exe
[console]::OutputEncoding = $prev

Note that $content will then contain a single, multi-line string; use $content -split "`r`n" to split into an array of lines.


As for:

Are there other commands like "sfc" that are formatted in the same way, or that will result in a broken output if redirected?

Not that I'm personally aware of; unconditional UTF-16LE output, as in sfc.exe's case, strikes me as unusual (other programs may offer that on an opt-in basis).

Older console programs with a Windows-only heritage use a (possibly fixed) OEM code page, which is a single-byte 8-bit encoding that is a superset of ASCII.

Increasingly, modern, multi-platform console programs use UTF-8 (e.g., the Node.js CLI), which is variable-width encoding capable of encoding all Unicode characters that is backward-compatible with ASCII (that is, in the 7-bit ASCII range UTF-8 encodes all characters as single, ASCII-compatible bytes).

If you want to make your PowerShell sessions and potentially all console windows fully UTF-8 aware, see this answer (However, doing so stil requires the above workaround for sfc).


[1] Direct-to-console output:

When sfc output is neither captured by PowerShell nor routed through a cmdlet such as Tee-Object, sfc writes directly to the console, presumably using the Unicode version of the WriteConsole Windows API function, which expects UTF-16LE strings.

Writing to the console this way allows printing all Unicode characters, irrespective of what code page (reflected in chcp / [console]::OutputEncoding) is currently active. (While the rendering of certain characters may fall short, due to limited font support and lack of support for (the rare) characters outside the BMP (Basic Multilingual Plane), the console buffer correctly preserves all characters, so copying and pasting elsewhere may render correctly there - see the bottom section of this answer.)

Therefore, direct-to-console output is not affected by the misinterpretation and typically prints as expected.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Thank you again, but as you saw it in my other [question](https://stackoverflow.com/questions/57733552/redirect-powershell-output-and-errors-to-console-in-real-time-and-to-a-variabl/57760283), if the command is stored in a variable and that I need to redirect errors, is this the proper way to do it? `. { (. $command) -join "\`r\`n" -replace "\`r\`n\`r\`n", "\`r\`n" } 2>&1 | Tee-Object -Variable content` – Deaudouce Sep 02 '19 at 23:05
  • @Deaudouce: If `$command` contains a _string_ - which I recommend against, as explained in the linked question - you need to use `(Invoke-Expression $command)` instead of `(. $command)`. – mklement0 Sep 02 '19 at 23:19
  • I modified my code to store commands as `ScriptBlocks`, but I now have another weird issue. With `sfc` processing commands, the output is not shown in real-time but only at the end. In fact, without modifying any code, and by running the same script over and over to test it, it appears that the output is correctly shown in real-time about one time out of 15 (I tried with `&` and `.` operators to call `ScriptBlocks`). It's very strange. But I don't have this problem for the other commands I tried (`ping`, `Dism`...). – Deaudouce Sep 03 '19 at 22:19
  • @Deaudouce: An expression such as `(sfc /?)` never _streams_: it always fully collects its output first. To get a streaming version of the solution above, consider using of a `switch` statement that keeps state and detects consecutive newlines that way. If you need help with that, please ask a _new_ question (feel free to ping me here once you've done so). – mklement0 Sep 04 '19 at 01:54
  • Sorry, I was talking about `sfc /scannow`, which first outputs that the process begins, then outputs a progressbar during the processing, and the results at the end. But sometimes the progress bar is correctly shown and updated, like if I had manually entered the command, sometimes nothing is shown until the end. – Deaudouce Sep 04 '19 at 21:23
  • @Deaudouce: I have no explanation, but generally speaking, such continuous, in-place progress updates aren't really suited to being _captured_ - they are meant for interactive end-user feedback. – mklement0 Sep 04 '19 at 22:25
  • @Deaudouce I took a quick look: It seems that with `/scannow` streaming doesn't start until the _first progress message_ is output (not sure why), which can take quite a while. The on-screen in-place updating of the progress percentage is done by appending a CR at the end of each progress message, which causes the following one to print on the _same_ line, and so on. With capturing / piping, you'll see each progress message a separate line, resulting in hundreds of messages (each progress percentage point (e.g., `5%`) can occur multiple times). – mklement0 Sep 05 '19 at 16:25
  • Thank you for your research, I'll leave it like this for now. – Deaudouce Sep 09 '19 at 16:08
  • @mklement0 Looks like db2 has a similar output encoding? https://stackoverflow.com/questions/61515432/powershell-tee-object-generates-empty-lines-in-output-when-used-in-db2-commands – js2010 May 01 '20 at 19:49
1

Looks like sfc outputs unicode no bom. Amazing.

cmd /c 'sfc > out'
get-content out -Encoding Unicode | where { $_ } # singlespace

Output:

Microsoft (R) Windows (R) Resource Checker Version 6.0
Copyright (C) Microsoft Corporation. All rights reserved.
Scans the integrity of all protected system files and replaces incorrect versions with
correct Microsoft versions.
SFC [/SCANNOW] [/VERIFYONLY] [/SCANFILE=<file>] [/VERIFYFILE=<file>]
    [/OFFWINDIR=<offline windows directory> /OFFBOOTDIR=<offline boot directory>]
/SCANNOW        Scans integrity of all protected system files and repairs files with
                problems when possible.
/VERIFYONLY     Scans integrity of all protected system files. No repair operation is
                performed.
/SCANFILE       Scans integrity of the referenced file, repairs file if problems are
                identified. Specify full path <file>
/VERIFYFILE     Verifies the integrity of the file with full path <file>.  No repair
                operation is performed.
/OFFBOOTDIR     For offline repair specify the location of the offline boot directory
/OFFWINDIR      For offline repair specify the location of the offline windows directory
e.g.
        sfc /SCANNOW
        sfc /VERIFYFILE=c:\windows\system32\kernel32.dll
        sfc /SCANFILE=d:\windows\system32\kernel32.dll /OFFBOOTDIR=d:\ /OFFWINDIR=d:\windows
        sfc /VERIFYONLY

Or delete the nulls and blank lines (windows prints nulls as spaces):

(sfc) -replace "`0" | where {$_}
js2010
  • 23,033
  • 6
  • 64
  • 66
  • 1
    Good analysis, but note that console programs never write a BOM to _stdout_. BOMs are typically only used at the start of _files_. Blindly removing NULs (`\`0`) only works as intended if the text is limited to Unicode characters in the 8-bit range, which is the ISO-8859-1 range, which excludes some Windows-1252 characters; for instance, it wouldn't work with a `€` character. – mklement0 Sep 02 '19 at 04:22
  • In this particular scenario, `-replace "\`0"` actually only works as expected for _ASCII-range_ characters (7-bit range), given the OEM-code-page-based decoding that is mistakenly applied. Given that the OP's localized French output contains Unicode characters in the 8-bit range (values between `0x100` and `0xFF`, such as `é` (`U+00E9`)), replacing the NULs is not an option. In the general case (ISO-8859-1-compatible code points in the lower bytes), the presence of any Unicode code units in the input with value 256 (`0x100`) or higher (i.e. ones occupying _2_ bytes) would result in corruption. – mklement0 Sep 05 '19 at 17:03