9

After I learned about reading unicode files in Python 3.0 web script, now it's time for me to learn using print() with unicode.

I searched for writing unicode, for example this question explains that you can't write unicode characters to non-unicode console. However, in my case, the output is given to Apache and I am sure that it is capable of handling unicode text. For some reason, however, the stdout of my web script is in ascii.

Obviously, if I was opening a file to write myself, I would do something like

open(filename, 'w', encoding='utf8')

but since I'm given an open stream, I resorted to using

sys.stdout.buffer.write(mytext.encode('utf-8'))

and everything seems to work. Does this violate some rule of good behavior or has any unintended consequences?

Community
  • 1
  • 1
ilya n.
  • 18,398
  • 15
  • 71
  • 89
  • you can write Unicode characters that are not supported by the current (Windows) console encoding if you use Win32 API such as `WriteConsoleW()`. [`win-unicode-console` Python package mentioned below](http://stackoverflow.com/a/29543612/4279) does it for you. Though it has nothing to do with Apache. – jfs Apr 11 '15 at 14:15

2 Answers2

11

I don't think you're breaking any rule, but

sys.stdout = codecs.EncodedFile(sys.stdout, 'utf8')

looks like it might be handier / less clunky.

Edit: per comments, this isn't quite right -- @Miles gave the right variant (thanks!):

sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer) 

Edit: if you can arrange for environment variable PYTHONIOENCODING to be set to utf8 when Apache starts your script, that would be even better, making sys.stdout be set to utf8 automatically; but if that's unfeasible or impractical the codecs solution stands.

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • With this line I get "TypeError: can't write bytes to text stream" – ilya n. Jun 11 '09 at 22:35
  • I think it's because stdout starts already being a text stream with a *wrong* ascii codec. – ilya n. Jun 11 '09 at 22:36
  • 3
    Try: sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer) – Miles Jun 12 '09 at 00:34
  • @Miles, you have it just right -- hope you don't mind if I edit my answer to include your better idea...! – Alex Martelli Jun 12 '09 at 01:20
  • 1
    No problem. I didn't make my own answer because I'm not sure what constitutes "best practice" for a lot of Python 3 encoding issues. One thing I don't like is that, if all references to the original stdout TextIOWrapper are lost (if sys.__stdout__ is overwritten, for instance), the underlying buffer will be closed, and there is no way around that, AFAICT, other than to make sure a reference is maintained. – Miles Jun 12 '09 at 05:19
  • To be quite honest: nobody (including us, Python core committers) is sure what's "best practice" in Python 3 either, YET -- we're all still figuring it out!-). So another +1 on your latest comment...;-) – Alex Martelli Jun 12 '09 at 05:58
  • Thanks to all! That works, although I'm still a bit scared -- we the simple folk were taught to use the highest level abstraction possible... – ilya n. Jun 12 '09 at 08:45
  • Using "the highest _feasible_ level of abstraction" is a good rule of thumb. If you can arrange environment variable PYTHONIOENCODING to be set to 'utf8' when Apache runs your code, that would be even better, I'm editing the answer to reflect that; but how to arrange it is more of a sysadm problem (httpd.conf? wrapper shell script?) so I'm not getting into that. – Alex Martelli Jun 12 '09 at 14:26
  • When I use this answer (@Miles's), and then call the builtin input('a prompt'), it fails with "AttributeError: 'BufferedWriter' object has no attribute 'encoding'" from codecs.py. (I'm using Python 3.0.) Perhaps I'm doing something obviously dumb, being new to Python 3. My workaround: print the prompt in a separate statement and use no-arg input(). – Darius Bacon Aug 31 '10 at 06:34
  • if `PYTHONIOENCODING` can't be set for some reason, then `sys.stdout=io.TextIOWrapper(sys.stdout.detach(), encoding='utf-8')` could be used instead of `codecs` module. – jfs Apr 11 '15 at 14:19
1

This is an old answer but I'll add my version here since I first ventured here before finding my solution.

One of the issues with codecs.getwriter is if you are running a script of sorts, the output will be buffered (whereas normally python stdout prints after every line).

sys.stdout in the console is a IOTextWrapper, so my solution uses that. This also allows you to set line_buffering=True or False.

For example, to set stdout to, instead of erroring, backslash encode all output:

sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding=sys.stdout.encoding,
                              errors="backslashreplace", line_buffering=True)

To force a specific encoding (in this case utf8):

sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding="utf8",
                              line_buffering=True)

A note, calling sys.stdout.detach() will close the underlying buffer. Some modules use sys.__stdout__, which is just an alias for sys.stdout, so you may want to set that as well

sys.stdout = sys.__stdout__ = io.TextIOWrapper(sys.stdout.detach(), encoding=sys.stdout.encoding, errors="backslashreplace", line_buffering=True)
sys.stderr = sys.__stderr__ = io.TextIOWrapper(sys.stderr.detach(), encoding=sys.stdout.encoding, errors="backslashreplace", line_buffering=True)
khazhyk
  • 1,738
  • 1
  • 18
  • 13
  • I've seen very similar solutions in several places, but I found a problem with it (Windows, python 3.6): If you do something like "myprog.py | head", then python throws a strange error: "Exception ignored in: – joeking Jun 10 '17 at 00:13
  • Interesting... I can reproduce the following error when stdout is presumably closed before reading all of it, on cmd.exe and msys bash. On 3.5 and 3.6... Traceback (most recent call last): File "crash_in_head.py", line 7, in print('hi') OSError: [Errno 22] Invalid argument Exception ignored in: <_io.TextIOWrapper name='' encoding='utf8'> OSError: [Errno 22] Invalid argument Your suggestion does fix that issue! – khazhyk Jun 12 '17 at 03:05