0

I'm on Windows 10 with Powershell 5.1

printargs.py is:

#! /usr/bin/python3
import sys
for arg in sys.argv:
    print(arg)

Case 1

I have a Windows batch file runme.bat:

chcp 65001
py printargs.py ä

Note: py.exe is the Python launcher for Windows

This is working: I invoke the batch file in a Powershell terminal and I get output

printargs.py
ä

Case 2

Now I want powershell script runme.ps1 doing exactly the same thing:

# What code must go here?
& py printargs.py ä

This is NOT working: Because of some encoding problem I get

printargs.py
ä

I' am aware of this question.

I tried without success:

$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding
17tmh
  • 13
  • 4
  • Fix may be as simple as setting the localization correctly or doing something like r('a'), see details (article contain lots informative links), https://nick.groenen.me/posts/%C3%A4-and-%C3%A4-are-not-the-same-character/ and https://docs.python.org/3/library/stdtypes.html#textseq – mzm Dec 21 '22 at 14:09
  • I don't think this behavior is related to Python. – 17tmh Dec 21 '22 at 19:28
  • Make the .ps1 file utf8withbom encoding. – js2010 Dec 22 '22 at 02:57
  • @js2010 you're right, this solves the problem. As far as I get it, the BOM signals powershell that the ps1-file is UTF8-encoded. Is there no way to do this programatically and without adding the BOM? I'm a bit confused, could anybody elaborate on the topic or redirect me to a full answer? – 17tmh Dec 22 '22 at 08:03

1 Answers1

0

This comes up a lot. Powershell 5.1 can't read utf8 no bom. This will covert the script to utf8 with bom. Powershell 7 can handle utf8 no bom.

(get-content -encoding utf8 script.ps1) | set-content script.ps1 -encoding utf8


# bom tests
(get-content script.ps1 -AsByteStream)[0] -eq 0xef -and 
(get-content script.ps1 -AsByteStream)[1] -eq 0xbb

True


# right side gets converted to string
'239 187' -eq (get-content script.ps1 -AsByteStream)[0..1]

True
js2010
  • 23,033
  • 6
  • 64
  • 66
  • Probably the main takeaway from [Microsoft docs](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_character_encoding?view=powershell-7.3): "If you need to use non-Ascii characters in your scripts, save them as UTF-8 with BOM. Without the BOM, Windows PowerShell misinterprets your script as being encoded in the legacy "ANSI" codepage." – 17tmh Dec 23 '22 at 09:23
  • On my system I got the following: If `in.ps1` is encoded as utf8-without-bom *and* contains non-Ascii chars, then (**your code**) `(get-content in.ps1) | set-content out.ps1 -encoding utf8` produces a file `out.ps1` with utf8-with-bom encoding but also *messed up non-Ascii chars*. On the other hand the non-Ascii chars are *handled correctly* with `(get-content in.ps1 -encoding utf8) | set-content out.ps1 -encoding utf8`. – 17tmh Dec 23 '22 at 09:31
  • Correction made. – js2010 Dec 23 '22 at 14:44