16

Under Powershell v5, Windows 8.1, Python 3. Why these fails and how to fix?

[system.console]::InputEncoding = [System.Text.Encoding]::UTF8; 
[system.console]::OutputEncoding = [System.Text.Encoding]::UTF8; 
chcp; 
"import sys
print(sys.stdout.encoding)
print(sys.stdin.encoding)
sys.stdout.write(sys.stdin.readline())
" | 
sc test.py -Encoding utf8; 
[char]0x0422+[char]0x0415+[char]0x0421+[char]0x0422+"`n" | py -3 test.py

prints:

Active code page: 65001
cp65001
cp1251
п»ї????
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Artyom
  • 3,507
  • 2
  • 34
  • 67
  • Every program has its own stdin/stdout encoding. There's no global system setting that can override another program's settings, so whatever you set in PowerShell affects only PowerShell. Set your python stdin encoding manually, if possible. I think there should be lots of examples for that. – wOxxOm Aug 27 '16 at 18:14
  • @wOxxOm Is there some conventions for Python? It seems to get its stdout encoding from system one. But why not stdin? – Artyom Aug 27 '16 at 19:44
  • My point is that every program uses its own heuristics and logic, so whatever you set in PowerShell doesn't apply to python's handling of stdin. Well, generally. Python 3 tries to be smart and guess stdin encoding, but you can't *rely* on that, of course. As I said there should be LOTS of examples how to set stdin encoding in python. – wOxxOm Aug 27 '16 at 19:49
  • `$OutputEncoding=[System.Text.Encoding]::UTF8`? – user4003407 Aug 27 '16 at 19:59

2 Answers2

8

You are piping data into Python; at that point Python's stdin is no longer attached to a TTY (your console) and won't guess at what the encoding might be. Instead, the default system locale is used; on your system that's cp1251 (the Windows Latin-1-based codepage).

Set the PYTHONIOENCODING environment variable to override:

PYTHONIOENCODING
If this is set before running the interpreter, it overrides the encoding used for stdin/stdout/stderr, in the syntax encodingname:errorhandler. Both the encodingname and the :errorhandler parts are optional and have the same meaning as in str.encode().

PowerShell doesn't appear to support per-command-line environment variables the way UNIX shells do; the easiest is to just set the variable first:

Set-Item Env:PYTHONIOENCODING "UTF-8"

or even

Set-Item Env:PYTHONIOENCODING "cp65001"

as the Windows UTF-8 codepage is apparently not quite UTF-8 really, depending on the Windows version and on wether or not pipe redirection is used.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thank you for the answer. Yet it prints then this: "Active code page: 65001 UTF-8 UTF-8 ????" – Artyom Sep 12 '16 at 10:41
  • @Artyom: could you test with `sys.stdout.write(repr(sys.stdin.readline()))` please? That way we can see the contents of that line and if this is Python or Powershell getting things muddled. – Martijn Pieters Sep 12 '16 at 10:46
  • Then it is '\ufeff????\n'. Powershell gets it printed correctly actually if not piped in Python. That's breaking my faith in using Python under Powershell ;) – Artyom Sep 12 '16 at 10:49
  • @Artyom: so Python *received* those question marks from the pipe. This can't be Python's fault surely. Next test: `sys.stdout.write(repr(sys.stdin.buffer.readline()))`. Note the `buffer` there, you'll now get undecoded bytes. – Martijn Pieters Sep 12 '16 at 10:52
  • @Artyom: Also, according to [issue #13216](https://bugs.python.org/issue13216) there are differences between UTF-8 and CP65001 that matter when using redirects, so trying to set the `PYTHONIOENCODING` to `'cp65001'` *may* matter here. Sorry, no windows systems to test this on myself. – Martijn Pieters Sep 12 '16 at 10:53
  • Good way to test! Then it is "Active code page: 65001 cp65001 cp65001 b'\xef\xbb\xbf????\n'" Yes, still not received by Python there. – Artyom Sep 12 '16 at 10:56
  • @Artyom: then this is either a bug in Python's I/O layer, *or* Powershell is not handing in UTF-8 data but question marks. The [UTF-8 BOM](https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8) at the start is not encouraging either; that's a Microsoft-ism that is entirely redundant in UTF-8 but lets MS tools distinguish between 8-bit encodings and UTF-8 when opening a file. – Martijn Pieters Sep 12 '16 at 11:03
  • @Artyom: if you can find *other* tools that do handle the UTF-8 pipe correctly then I suggest you file a bug with the Python project at http://bugs.python.org/. – Martijn Pieters Sep 12 '16 at 11:04
2

Why not embed CPython in powershell?! CPython is so easy to embed, and powershell is very good REPL to play with .NET and COM objects. Here is a simple introduction to using pythonnet from PowerShell. Note how encoding is automatically propagated from powershell to python.

Windows PowerShell
Copyright (C) 2015 Microsoft Corporation. All rights reserved.

PS C:\Users\denfromufa> [system.console]::InputEncoding = [System.Text.Encoding]::UTF8;
PS C:\Users\denfromufa> [system.console]::OutputEncoding = [System.Text.Encoding]::UTF8;
PS C:\Users\denfromufa> [Reflection.Assembly]::LoadFile("C:\Python\Miniconda3_64b\Lib\site-packages\Python.Runtime.dll")


GAC    Version        Location
---    -------        --------
False  v4.0.30319     C:\Python\Miniconda3_64b\Lib\site-packages\Python.Runtime.dll


PS C:\Users\denfromufa> $gil = [Python.Runtime.Py]::GIL()
PS C:\Users\denfromufa> $sys=[Python.Runtime.Py]::Import("sys")
PS C:\Users\denfromufa> $sys.stdin.encoding.ToString()
cp65001
PS C:\Users\denfromufa> $sys.stdout.encoding.ToString()
cp65001
PS C:\Users\denfromufa> $gil.Dispose()
PS C:\Users\denfromufa> [Python.Runtime.PythonEngine]::Shutdown()
PS C:\Users\denfromufa>

[EDIT]

Here is snek package that was released by one of powershell developers for embedding Python in powershell:

https://github.com/adamdriscoll/snek

denfromufa
  • 5,610
  • 13
  • 81
  • 138
  • 1
    Valuable addition! Still the question not answered (looks like a bug). I wonder if pythonnet will run python scripts which work with encoding correctly. Yes, `stdin` there got encoding correctly; so looks promising – Artyom Sep 17 '16 at 12:14
  • 1
    Could you post if pythonnet receives and pipes out correctly as in the question? – Artyom Sep 17 '16 at 12:16