UTF-8 encoding exception with subprocess.run

Question

I'm having a hard time using the subprocess.run function with a command that contains accentuated characters (like "é" for example).

Consider this simple example :

# -*- coding: utf-8 -*-
import subprocess

cmd = "echo é"

result = subprocess.run(cmd, shell=True, stdout=subprocess.PIPE)

print("Output of subprocess.run : {}".format(result.stdout.hex()))
print("é char encoded manually : {}".format("é".encode("utf-8").hex()))

It gives the following output :

Output of subprocess.run : 820d0a
é char encoded manually : c3a9

I don't understand the value returned by subprocess.run, shouldn't it also be c3a9 ? I understand the 0d0a is CR+LF, but why 82 ?

Because of this, when I try to run this line :

output = result.stdout.decode("utf-8")

I get a UnicodeDecodeError Exception with the following message : 'utf-8' codec can't decode byte 0x82 in position 0: invalid start byte

I tried explicitly specifying the encoding format like this :

result = subprocess.run(cmd, shell=True, stdout=subprocess.PIPE, encoding="utf-8")

But this raises the same exception ('utf-8' codec can't decode byte 0x82 in position 0: invalid start byte) when subprocess.run is called.

I'm running this on Windows 10 with Python3.8.5.

I hope someone can help me with this, any hint ?

MagnusO_O · Accepted Answer · 2022-08-30T18:02:53.960

As a fix try cp437 decoding:

print("Output of subprocess.run : {}".format(result.stdout.decode('cp437')))

# or

result = subprocess.run(cmd, shell=True, stdout=subprocess.PIPE, text=True, 
                        encoding="cp437")

print(f"Output of subprocess.run : {result.stdout}")

From other stackoverlow answers it seems that Windows terminal code issue is old and probably should be fixed by now, but it seems it still is present.

https://stackoverflow.com/a/37260867/11815313

Anyway I have no deeper understanding in Windows 10 terminal encoding, but cp437 worked for my Win10 system.

However the Python 3.9.13 documentation 3. Using Python on Windows 3.7. UTF-8 mode states an option to temporarily or permanent (note the caveat mentioned in the documentation) change the encoding.

Thank you very much it fixed my issue ! There is still one weird thing though: the output of `sys.stdout.encoding` in my python interpreter is "UTF-8" and not "cp437"... But I can only decode stdout with cp437. I'll just assume it's some windows magic :) — PaulM, Sep 01 '22 at 07:54

UTF-8 encoding exception with subprocess.run

1 Answers1