49

I've created a UTF8 script for PowerShell with non-ascii characters.

characters.ps1:

Write-Host "ç â ã á à"

When the script is run in PowerShell console, it outputs wrong characters.

enter image description here

However, if I write the chars directly in the console, they are shown as expected:

enter image description here

Does anyone knows what causes that behavior?

The problem arised from a script I wrote who has hardcoded paths which include non-ascii characters. When I try to pass the path as argument to cmdlets (in the case I was gonna robocopy a folder) the command fails because it cannot find the path (which is output wrongly in the screen).

Arthur Nunes
  • 6,718
  • 7
  • 33
  • 46
  • 1
    With the character "Ä" (capital ä) it's even worse; as soon as you write it between double quotation marks it will produce an error, if the file is encoded with utf-8 without bom. – gustavwiz Jan 31 '18 at 09:10

4 Answers4

109

Changing the encoding of the script to UTF-8 with BOM solved the issue.

I was using SublimeText with the EncodingHelper plugin to control the character-set of the script. It was set correctly to UTF8.

I changed the encoding of the script in SublimeText to "UTF-8 with BOM" and the output was shown correctly.

I created the same script with Notepad++, which defaults to "UTF-8 with BOM", and the string was shown correctly in the console.

I changed the encoding of the script in Notepad++ to "UTF-8 without BOM" and it was shown incorrectly.

It seems PowerShell cannot guess correctly the encoding of UTF-8 files with no BOM.

Arthur Nunes
  • 6,718
  • 7
  • 33
  • 46
  • 27
    This is pathetic. Especially considering [how pointless and useless](/questions/2223882/whats-different-between-utf-8-and-utf-8-without-bom) UTF-8 BOM is. +1 for *enlightening information* though. – ulidtko Jan 05 '15 at 16:22
  • Tried like 10 commands wih no result, and it was just that ... Thanks bro – Nicolas Leucci Nov 07 '17 at 14:06
  • 2
    I would guess that in the absence of the BOM, Windows assumes Windows-1252 encoding for legacy reasons, unlike Linux which assumes UTF-8. – dOxxx Jun 19 '19 at 12:15
  • This happened to me too. I created a PowerShell script with VS Code that created an Azure AD group with accented characters in the group description. Something was mangling the description, and it looks like that something was PowerShell. VS Code created the script as UTF-8 with no BOM, but I used Notepad++ to add the BOM and that fixed it. – Cory Grimster Mar 24 '21 at 22:43
12

In my case the problem was caused by creating a new PowerShell script with Visual Studio Code which has the default encoding of UTF-8 without BOM. Set the encoding to "Windows 1252" solved the problem.

It seems that PowerShell can't handle UTF-8 without BOM, it needs "Windows 1252" or "UTF8 with BOM" encodings.

Philippe
  • 28,207
  • 6
  • 54
  • 78
awineb
  • 393
  • 4
  • 7
1

There is a reliable way to detect utf8nobom (https://unicodebook.readthedocs.io/guess_encoding.html). Like a lot of other little things, this seems to work better in PS 6. Even my beloved emacs 25 for windows gets the encoding wrong.

PS C:\users\admin> pwsh
PowerShell 6.1.0
Copyright (c) Microsoft Corporation. All rights reserved.

https://aka.ms/pscore6-docs
Type 'help' to get help.

PS C:\users\admin> "write-host 'ç â ã á à'" | set-content -Encoding utf8NoBOM accent.ps1
PS C:\users\admin> .\accent
ç â ã á à
js2010
  • 23,033
  • 6
  • 64
  • 66
0

try this before invoking your script :

 $OutputEncoding = [Console]::OutputEncoding
Loïc MICHEL
  • 24,935
  • 9
  • 74
  • 103
  • 16
    What does this code do? What is it's *meaning*? Why is this necessary? – ulidtko Jan 05 '15 at 16:23
  • 2
    @ulidtko It's a [preference variable](https://msdn.microsoft.com/en-us/powershell/reference/5.1/microsoft.powershell.core/about/about_preference_variables) which changes the encoding that PowerShell uses when writing data to other programs (e.g. data leaves PowerShell to go to find.exe stdin). The encoding defaults to ASCII because that was more compatible at the time the design was set. https://blogs.msdn.microsoft.com/powershell/2006/12/11/outputencoding-to-the-rescue/ (Based on this, I suspect this answer would be ineffective for this question). – TessellatingHeckler Jun 09 '17 at 00:15