14

Is there a good reason why I shouldn't start all my python programs with this? Is there something special lost when doing exec like this?

#!/usr/bin/python
import os, sys
if sys.stdout.encoding == None:
    os.putenv("PYTHONIOENCODING",'UTF-8')
    os.execv(sys.executable,['python']+sys.argv)
print sys.stdout.encoding

There are 60 questions about PYTHONIOENCODING so I guess it's a common problem, but in case you don't know, this is done because when sys.stdout.encoding == None then you can only print ascii chars, so e.g. print "åäö" will throw an exception..

EDIT This happens to me when stdout is a pipe; python encoding.py|cat will set encoding to None

Another solution is to change the codec of stdout sys.stdout = codecs.getwriter('utf8')(sys.stdout) which I'm guessing is the correct answer dispite the comments on that question.

Community
  • 1
  • 1
Erik Johansson
  • 323
  • 1
  • 5
  • 15

1 Answers1

8

Yes, there is a good reason not to start all your Python programs like that.

First of all:

sys.stdout.encoding is None if Python doesn't know what encoding the stdout supports. This, in most cases, is because it doesn't really support any encoding at all. In your case it's because the stdout is a file, and not a terminal. But it could be set to None because Python also fails to detect the encoding of the terminal.

Second of all: You set the environment variable and then start a new process with the smae command again. That's pretty ugly.

So, unless you plan to be the only one using your programs, you shouldn't start them like that. But if you do plan to be the only using your program, then go ahead.

More in-depth explanation

A better generic solution under Python 2 is to treat stdout as what it is: An 8-bit interface. And that means that anything you print to to stdout should be 8-bit. You get the error when you are trying to print Unicode data, because print will then try to encode the Unicode data to the encoding of stdout, and if it's None it will assume ASCII, and fail, unless you set PYTHONIOENCODING.

But by printing encoded data, you don't have this problem. The following works perfectly even when the output is piped:

print u'ÅÄÖ'.encode('UTF8')

(However, this will fail Under Python 3, because under Python 3, stdout is no longer 8-bit IO, you are supposed to give it Unicode data, and it will encode by itself. If you give it binary data, it will print the representation. Therefore on Python 3 you don't have this problem in the first place).

Lennart Regebro
  • 167,292
  • 41
  • 224
  • 251
  • Why is it a bad idea, you don't really mention that... What's worse not getting any data from a program, or getting it in the wrong encoding? My problem is that this is being run by ~20 people and I need to tell them all to set PYTHONIOENCODING, so this is a sane default. – Erik Johansson Apr 01 '13 at 20:50
  • @ErikJohansson: It's worse getting it in the wrong encoding. Otherwise you get an error which you can fix. There is no reason for you to need to set PYTHONIOENCODING, the error is somewhere else. – Lennart Regebro Apr 02 '13 at 06:51
  • @Lennart_Regebro: No, while I agree that exec is an ugly solution, defaulting to something useable as output encoding not wrong. An application has to beable to output data to work. – Erik Johansson Apr 02 '13 at 12:36
  • Also see question linked above, and I tried this in p3k and it seems to use my locale as default encoding. – Erik Johansson Apr 02 '13 at 12:37
  • @ErikJohansson: It sounds to me like these are custom scripts especially made for your environment. If so, then as I mentioned in my answer, setting it to UTF-8 is fine. But for generic open source scripts it's not. Setting it to the locale encoding is much more sensible. You can also look at sys.stdout.isatty() to check if it's a terminal or not. – Lennart Regebro Apr 02 '13 at 12:52
  • Also "restarting" the process is kinda ugly, why not just set PYTHONIOENCODING for all your users all the time? – Lennart Regebro Apr 02 '13 at 13:17
  • They are not my users the script is really old, and has been in used for a while. My first attempt was to tell everyone to set PYTHONIOENCODING, but that fails when systems get reinstalled. My second attempt was to use bash and exec from there.. – Erik Johansson Apr 03 '13 at 08:18
  • @ErikJohansson: OK. If I were you I'd either force the people to use Python 3 (and port the script, which may be tricky, but probably isn't), or change the script to only print 8-bit "bytes" strings. Both are the "right thing" to do, and likely to have less issues than any other solution. If there are a lot of print statements just telling people to use Python 3 is easier. ;-) – Lennart Regebro Apr 03 '13 at 08:29
  • I don't like that "åäö".encode() tip, most people know that you can reencode strings, what would be helpfull is if a default can be set. I'm not sure what best practice would be to set such a default. – Erik Johansson Apr 03 '13 at 08:30
  • Yes. going with python 3 is a nice idea, not really worth the effort, in porting it though. – Erik Johansson Apr 03 '13 at 08:32
  • Regarding your last point: if I paste this onto my (utf8) terminal: `python -c 'import sys; print sys.stdout.encoding; print u"ÅÄÖ".encode("UTF8")' | cat`, I get the wrong output (it prints `ÃÃÃ`). This is because of the implicit encoding that happens with the program source text. The code that python actually sees and runs is `print u"\xc3\x85\xc3\x84\xc3\x96"...`. So... beware (: – jwd Feb 01 '17 at 18:39