0

I have been trying dozens of different options for an hour, strip the BOM, set file to ascii / oem / utf8, followed the ideas here, but the BOM remains in all cases. I'm a bit stumped.

Firstly, I always Remove-Item with -Force to make sure that there is no remnant of the file before I start, then I pipe some simple text into a newly created file:

$out = @'
####################
#
# Shortcuts
#
####################
'@
$OutputEncoding = [Text.Utf8Encoding]::new($false)
$out | Out-File -FilePath $ShortcutNotes -Force -Encoding utf8
# I have tried utf8 / oem / ascii

This part of text always looks fine in the final file, no strangeness.

But then I pipe some fairly innocuous output from some operations into the file and everything goes wrong:

# Some operations before here, but this is the important line:
$AppName = (($AppShortcut -split "\\")[-1] -split "\.")[0]
"App added: $AppName" | Out-File -FilePath $ShortcutNotes -Append

End result, I get UTF8 with BOM when I open the file in Notepad++ and it looks like this (header part is always fine, then the rest has the NULLs):

####################
#
# Shortcuts
#
####################

A[NULL]p[NULL]p[NULL] [NULL]a[NULL]d[NULL]d[NULL]e[NULL]d[NULL]:[NULL] [NULL]G[NULL]o[NULL]o[NULL]g[NULL]l[NULL]e[NULL] [NULL]C[NULL]h[NULL]r[NULL]o[NULL]m[NULL]e

I've even tried to strip BOM afterwards:

(Get-Cotent $ShortcutNotes) -replace "\xEF\xBB\xBF", "" | Set-Content $ShortcutNotes

All of the above fails, and the file is always UTF8 with BOM and always garbled as above.

I've run out of things to try, can anyone see what I am doing wrong that my output is always garbled in this way?

Update The operations that I'm doing after the header are very simple, but maybe they will suggest what is going wrong. This is a snippet from the function that is called and that writes to the file. It seems to be this that is causing the issue, but I don't know how/why:

function Setup-AppAndShortcut ($AppFolder, $ZipFile, $AppExe, $AppShortcut) {
    if (Test-Path $AppExe) {
        $ShortcutFile = $AppShortcut
        $Shortcut = $WScriptShell.CreateShortcut($ShortcutFile)
        $Shortcut.TargetPath = $AppExe
        $Shortcut.Save()
        [string]$AppName = (($AppShortcut -split "\\")[-1] -split "\.")[0]
        "App added: $AppName" | Out-File -FilePath $ShortcutNotes -Append   # These are the lines that always end up garbled with NULLs in my final output.
    }

YorSubs
  • 3,194
  • 7
  • 37
  • 60
  • Does this answer your question? [Using PowerShell to write a file in UTF-8 without the BOM](https://stackoverflow.com/a/5596984/1701026) – iRon Sep 07 '22 at 06:45
  • Thanks, sadly not, just tried it now, I will update my question with information about the operations that I'm making, maybe you will see something that I'm doing wrong there. – YorSubs Sep 07 '22 at 08:43
  • I noticed that after I use your technique, regardless of whether I use `-Encoding ascii` or `utf8` or `oem` when I open the final file in Notepad++, it always reports as `ANSI` (whereas it would previously report as `UTF8 with BOM`, but the output is still full of `NULL` between every character. – YorSubs Sep 07 '22 at 08:49
  • 1
    "*it always reports as `ANSI`*", that is ok as it can't determine the encoding from the content (all 7 bit chars and there is no BOM defined ). For what you doing with `$AppName = (($AppShortcut -split "\\")[-1] -split "\.")[0]`, the output encoding is related to the encoding set by the host, see ((Windows) PowerShell console/ide/vscode), see: [Using UTF-8 Encoding (CHCP 65001) in Command Prompt / Windows Powershell (Windows 10)](https://stackoverflow.com/a/57134096/1701026). Try: `$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding` – iRon Sep 07 '22 at 09:13
  • What is actually in `$AppShortcut`? (that is where your issue originates, it apparently outputs a different encoding than you intent to use. Just `'Test' | Out-File -FilePath $ShortcutNotes -Append` works fine for me). See also: [Powershell correct encoding of exe-output](https://stackoverflow.com/a/35911310/1701026) – iRon Sep 07 '22 at 09:32
  • 1
    In PowerShell 7.x `Out-File` defaults to writing `utf8NoBOM`. If you're using PowerShell 5.x, then the default output is `Unicode` (==> UTF-16LE) – Theo Sep 07 '22 at 12:02
  • 1
    Out-file -append can mix encodings. I prefer add-content. – js2010 Sep 07 '22 at 12:17
  • 1
    I pointed this out three years ago: https://github.com/PowerShell/PowerShell/issues/9423 – js2010 Sep 07 '22 at 13:30
  • Replacing `Out-File` by `Add-Content` to stop `Out-File` from blindly altering encoding completely resolved the issue, thanks (I have been scratching my head at this issue for more than a week). I will avoid `Out-File` whenever possible in future as its behaviour (even though matching POSIX) is strange and almost always **not** what I want to happen. – YorSubs Sep 08 '22 at 03:27

1 Answers1

1

Use add-content instead of out-file -append, which can mix different encodings in the same file. Bug report:

out-file -append (or >>) can mix two encodings in the same file · Issue #9423 · PowerShell · GitHub

js2010
  • 23,033
  • 6
  • 64
  • 66