0

I have a very strange and frustrating problem here. Consider a python file, say test_char.py, with a single line in it:

print(u'\u2212')

If I were to run this file from the command line, everything goes fine and it prints out the proper character:

But, if I want to programatically examine the output of this file by running it with subprocess, as I might with the following code:

import subprocess
print(subprocess.run("python test_char.py", stdout=subprocess.PIPE))

The file itself produces the following traceback error:

Traceback (most recent call last):
  File "test_char.py", line 1, in <module>
    print(u'\u2212')
  File "C:\<blahblahblah>\Python\Python36\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2212' in position 0: character maps to <undefined>

I have no idea how to work around this, and it's driving me crazy because I can't examine the output of files that print out these types of characters.

Maurdekye
  • 3,597
  • 5
  • 26
  • 43
  • Did you set the coding to `utf-8`? – cs95 Oct 03 '17 at 00:54
  • Please elaborate. – Maurdekye Oct 03 '17 at 00:54
  • `# -*- coding: utf-8 -*-` – cs95 Oct 03 '17 at 00:57
  • I don't follow. – Maurdekye Oct 03 '17 at 01:00
  • Okay. Add that line at the top of your script and re-run. – cs95 Oct 03 '17 at 01:00
  • I added it to the top of `test_char.py`, and it still fails with the same error. – Maurdekye Oct 03 '17 at 01:03
  • Ah! Guess my expertise in encoding ends here. Sorry, hope you figure it out somehow. – cs95 Oct 03 '17 at 01:04
  • @Maurdekye: Read [(Why) PrintFails](https://wiki.python.org/moin/PrintFails) – unutbu Oct 03 '17 at 01:05
  • But the output *does* successfully print to the console; it's only when I try to pipe it through subprocess does it fail. – Maurdekye Oct 03 '17 at 01:08
  • When you print through a subprocess, `sys.stdout.encoding` is set to None. So Python does not know how to encode `u'\u2212'` when printing to a pipe. You need to print bytes, not unicode when writing to a pipe. – unutbu Oct 03 '17 at 01:31
  • Setting the [PYTHONIOENCODING](https://docs.python.org/3/using/cmdline.html#envvar-PYTHONIOENCODING) is one way to avoid the problem -- Python will use the encoding specified there by default. But then code run on your computer may not behave the same as when run on other people's computers. Thus, it is better to be explicit and print only bytes -- in other words, encode `u'\u2212'` before printing: e.g. `print(u'\u2212'.encode('utf-8'))`. – unutbu Oct 03 '17 at 01:33
  • Is there any way to work around it without changing the source file? – Maurdekye Oct 03 '17 at 02:51
  • Setting the PYTHONIOENCODING environment variable can be done without changing the source code. – unutbu Oct 03 '17 at 11:57
  • That seems to have worked. I can mark your post as the correct answer if you make it an answer. – Maurdekye Oct 03 '17 at 16:09

0 Answers0