6

I'm having some issues with python encoding. When I try to execute this:

subprocess.check_output("ipconfig", shell=True)

it gives me an output with special characters in it, like:

"Statut du m\x82dia"
"M\x82dia d\x82connect\x82"

(i'm french)

When I try decoding it with a .decode() at the end, it gives me this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 78: invalid start byte

I tried using .decode("utf-8"), I played around with encoding and decoding for hours, and I can't find the answer. Everything I looked on the internet didn't work. Maybe I'm just dumb, but hey. What can I do to get rid of those decoding errors and get my special characters to be printed?

Thanks.

Nim
  • 158
  • 2
  • 14
  • https://stackoverflow.com/questions/9941064/subprocess-popen-with-a-unicode-path https://bugs.python.org/issue27179 – Josh Lee Sep 28 '17 at 19:20
  • 2
    You could fix it by adding `encoding="437"` or `encoding="850"` to the arguments, but I wouldn't be happy with that answer since it doesn't explain why the codepage is what it is. – Josh Lee Sep 28 '17 at 19:22
  • 1
    @JoshLee Oh man, with `.decode(sys.stdout.encoding)` it works like a charm! Thanks man! Could you make your answer an "official" answer or something, so I can validate it? Thanks again, you made my day, I've been stuck with this for a long time ^^" – Nim Sep 28 '17 at 19:29
  • What Josh Lee said. FWIW, `b"M\x82dia d\x82connect\x82".decode('cp437')` returns `'Média déconnecté'`; you get the same result by decoding with 'cp850'. – PM 2Ring Sep 28 '17 at 19:29
  • @PM2Ring Thank you for your reply, I'll keep those encoding names in mind ^^ – Nim Sep 28 '17 at 19:32
  • You should probably use `sys.getdefaultencoding()` to get the encoding rather than `sys.stdout.encoding`. There's also `sys.getfilesystemencoding()`, which can be the same as `sys.stdout.encoding`, but it can be different, depending on how your system is set up (I don't use Windows, so I can't give any specific advice). – PM 2Ring Sep 28 '17 at 19:33
  • @PM2Ring Alright, thanks for the advices! – Nim Sep 28 '17 at 19:37
  • There's also `locale.getpreferredencoding()`. – Josh Lee Sep 28 '17 at 19:38
  • @PM2Ring @JoshLee `sys.getdefaultencoding()` returns utf-8, so it does the same error as before, `sys.getdefaultencoding()` returns "mbcs", I've no idea what it means but it doesn't work , it transforms the `\x82` into `\u201a`, and same with `locale.getpreferredencoding()`, witch returns "cp1252". but it's not important, my question is anwered, thanks to both of you ^^ – Nim Sep 28 '17 at 19:43

1 Answers1

8

You're invoking the command through CMD, which has a Unicode mode and an ANSI mode. The "correct" way is to invoke the Unicode mode, but you can add encoding="437" or encoding="850" to the subprocess call to make it work. This depends on you knowing what the current codepage is.

Josh Lee
  • 171,072
  • 38
  • 269
  • 275