1

I wrote a script that works perfectly in both PS7 and PS5, except it writes the log entries with null spaces after every character in PS5. I've never seen this before and can't find much about it online.

code that creates a log entry:

Write-Output "$logDate $thisScriptName v$thisScriptVersion - Script Started" | Out-File $logHistory -Append

The log file (open in VSCode): null character insertion

I can't copy and paste the text from VSCode (hence the snip) as it just pastes a null (0) value into the text area.

The correctly formatted text is added with ps7.

EDIT: As per comments, this was fixed by adding -Encoding ascii (or utf8) to the Out-File. Using Add-Content -Value $string -Path $file also worked for me.

  • 2
    Seems to be an encoding problem utf-8 vs utf-16? Try `Out-File -Encoding utf8` – T-Me Mar 13 '23 at 20:55
  • Could you check [this](https://stackoverflow.com/questions/3806305/powershell-2-0-generates-nulls-between-characters) and your files' encoding? – Dávid Laczkó Mar 13 '23 at 20:55
  • Quite right - thanks. I tried using Add-Content instead of piping to Out-File which worked without any problems. Changing the encoding also stops the issue. Appreciated! – RapidScampi Mar 13 '23 at 21:04
  • 1
    out-file -append is positively dangerous, and can mix encodings in the same file. – js2010 Mar 15 '23 at 01:34

1 Answers1

1

tl;dr

  • Out-File -Append always uses a default character encoding, irrespective of the encoding of the existing content of the target file; in Windows PowerShell, this default is "Unicode" (UTF-16LE), which explains the unexpected NUL characters. Therefore, use the -Encoding parameter to match the character encoding of the existing content; e.g.:

    "$logDate $thisScriptName v$thisScriptVersion - Script Started" |
      Out-File -Encoding utf8 $logHistory -Append
    
  • By contrast, Add-Content tries to match the existing content's encoding, but Windows PowerShell and PowerShell (Core) make different assumptions with respect to what the existing encoding is if the existing file lacks a Unicode BOM (byte-order mark):

    • Windows PowerShell, assumes ANSI encoding, whereas PowerShell (Core) assumes UTF-8. The only case in which this difference does not matter is if both the existing file content and the new content being added are composed of ASCII-range characters only.
    • Therefore, in cross-edition code you may need -Encoding even with Add-Content.

Background information:

In Windows PowerShell (the legacy, Windows-only, ships-with-Windows edition whose latest and last version is 5.1):

  • Out-File's default encoding - whether or not you use -Append - is "Unicode", i.e. UTF-16LE, in which ASCII-range Unicode characters have a NUL (0x) byte as the high byte of each two-byte sequence encoding such a character, which is what you saw. This also applies to the redirection operators, > and >>, which are in effect aliases of Out-File and Out-File -Append

    • Therefore, if you use Out-File -Append with an existing file that uses an encoding other than UTF-16LE, you must use the -Encoding parameter explicitly, e.g.,
      ... | Out-File -Encoding utf8 -Append $logHistory
  • By contrast, Add-Content and Set-Content, default to the system's active ANSI code page, as determined by the legacy system locale (aka language for non-Unicode programs), such as Windows-1252 on US-English systems.

    • However, unlike its conceptual analog, Out-File -Append, Add-Content tries to match the file's existing encoding based on the presence of a Unicode BOM. If there is none, the ANSI default applies.
      Notably, this means that a BOM-less UTF-8 will not be correctly appended to, and you'll need -Encoding utf8 in that case too.

That different cmdlets in Windows PowerShell have different defaults is unfortunate - see the bottom section of this answer for an overview.

Fortunately, this inconsistent use of character encodings has been rectified in PowerShell (Core) 7+ - see below.


In PowerShell (Core) 7+ (the modern, cross-platform, install-on-demand edition):

  • Fortunately, the consistent default is now (BOM-less) UTF-8 - this applies across all cmdlets and also to how the PowerShell engine itself reads source-code files.

    • Notably, this means that ANSI-encoded source-code files will be misread by PowerShell (Core) if non-ASCII characters are present, and the safe cross-edition way to save source-code files is to use UTF-8 with BOM.
  • The fundamental behaviors of Out-File -Append and Add-Content still apply, however, which means:

    • Using Out-File -Append without -Encoding always appends UTF-8-encoded text.

    • Using Add-Content without -Encoding appends UTF-8-encoded text if the existing file has no BOM.

    • Notably, this means that an ANSI-encoded file (which never has a BOM) will have incorrectly encoded text appended to it and requires use of
      -Encoding ([cultureinfo]::CurrentCulture.TextInfo.ANSICodePage)

      • Surprisingly, the -Encoding parameter does not accept Ansi as an identifier as of PowerShell 7.3.3, though this will likely be rectified in PowerShell 7.4 - see GitHub PR #19298
mklement0
  • 382,024
  • 64
  • 607
  • 775