31

I've got a python script that outputs unicode to the console, and I'd like to redirect it to a file. Apparently, the redirect process in python involves converting the output to a string, so I get errors about inability to decode unicode characters.

So then, is there any way to perform a redirect into a file encoded in UTF-8?

Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
alexgolec
  • 26,898
  • 33
  • 107
  • 159
  • 1
    Possible duplicate of [Setting the correct encoding when piping stdout in Python](http://stackoverflow.com/questions/492483/setting-the-correct-encoding-when-piping-stdout-in-python) – techraf Jan 10 '16 at 23:13

4 Answers4

32

When printing to the console, Python looks at sys.stdout.encoding to determine the encoding to use to encode unicode objects before printing.

When redirecting output to a file, sys.stdout.encoding is None, so Python2 defaults to the ascii encoding. (In contrast, Python3 defaults to utf-8.) This often leads to an exception when printing unicode.

You can avoid the error by explicitly encoding the unicode yourself before printing:

print (unicode_obj.encode('utf-8'))

or you could redefine sys.stdout so all output is encoded in utf-8:

import sys
import codecs
sys.stdout=codecs.getwriter('utf-8')(sys.stdout)
print(unicode_obj)
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • I tried encoding my string before printing it to the console. My file has b symbol and \x01 ... codepoints... When it wrote to the file, it was not decoded – Malwaregeek Mar 23 '16 at 05:59
15

Set the environment variable PYTHONIOENCODING to the encoding you want before redirecting a python script to a file. Then you won't have to modify the original script. Make sure to write Unicode strings as well, otherwise PYTHONIOENCODING will have no effect. If you write byte strings, the bytes are sent as-is to the terminal (or redirected file).

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • Running `PYTHONENCODING=utf8 script.py` does not work for me at all. Neither PYTHONSTDENCODING which I saw purposed somewhere else. Maybe I'm doing it wrong. – qwertyboy Nov 26 '13 at 12:33
  • 1
    @qwertyboy, you're doing it wrong. Neither of those spellings are the spelling in my answer. – Mark Tolonen Nov 26 '13 at 15:51
  • You are right, of course. Sorry about that, (it was a late hour) and thanks. But eventually I did try your spelling as well, here is a copy from my terminal: `PYTHONIOENCODING=utf-8 ./quote.py > log.txt`, and it didn't work. The file command outputs `log.txt: Non-ISO extended-ASCII text` – qwertyboy Nov 27 '13 at 11:49
  • 1
    @qwertyboy, I don't have a Linux system handy, but is `quote.py` printing Unicode strings? That's all it will affect. If it is printing byte strings they dump their bytes to the display untranslated. Another possibility is your terminal isn't configured to decode UTF-8 and a simple `cat log.txt` won't display correctly. – Mark Tolonen Nov 27 '13 at 20:01
  • the confusing bit is precisely that when I run quote.py on its own it displays unicode strings on my terminal perfectly. Only when I redirect it into a file does it revert to latin1, and I find myself resorting to `./quote.py | iconv -c -t utf8 > log.txt`. Explicitly marking the strings as unicode (u'string') seems to help, though, when combined with PYTHONIOENCODING. Thanks. – qwertyboy Nov 28 '13 at 01:31
5

Under Linux, you can use tee and redirect stderr to /dev/null.

python script.py 2>/dev/null | tee filename.txt

You also don't need to modify your Python script.

Favonius
  • 13,959
  • 3
  • 55
  • 95
3
import codecs
file_object = codecs.open( "filename", "w", "utf-8" )
file_object.write(u"खऔणन")
file_object.close()


This should do the job.

Lelouch Lamperouge
  • 8,171
  • 8
  • 49
  • 60