2

I have the following simple program:

# -*- coding: utf-8 -*-

GREEK = u'ΑΒΓΔ ΕΖΗΘ ΙΚΛΜ ΝΞΟΠ ΡΣΤΥ ΦΧΨΩ αβγδ εζηθ ικλμ νξοπ ρςτυ φχψω'

print GREEK

Running this on the terminal produces, as expecte:

$ python test.py
ΑΒΓΔ ΕΖΗΘ ΙΚΛΜ ΝΞΟΠ ΡΣΤΥ ΦΧΨΩ αβγδ εζηθ ικλμ νξοπ ρςτυ φχψω

But piping the output to another program, causes an error:

$ python test.py | less

Traceback (most recent call last):
  File "test.py", line 5, in <module>
    print GREEK
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
Traceback (most recent call last):
  File "ddd.py", line 5, in <module>
    print GREEK
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
  • Why is this failing? Why is redirection affecting the way the program is run? I would have expected that a program run in the shell is always redirected: sometimes to a terminal program, sometimes to another program (less in this case). Why is the "destination" program affecting the execution of the source program?
  • What can I do to make sure that the program runs independently of whether it is sent to the terminal or to another destination?
blueFast
  • 41,341
  • 63
  • 198
  • 344
  • and http://stackoverflow.com/questions/17918746/print-unicode-string-to-console-ok-but-fails-when-redirect-to-a-file-how-to-fix/17918823#17918823 and http://stackoverflow.com/questions/17419126/understanding-python-unicode-and-linux-terminal and http://stackoverflow.com/questions/17430168/python-encoding-issue-when-using-linux – Josh Lee Oct 31 '13 at 01:09
  • First, do _you_ know what encoding the other program expects for its stdin? If so, you have to tell Python. If not, there's no reasonable answer here, because Python can't magically figure it out when you can't… – abarnert Oct 31 '13 at 01:09

2 Answers2

7

Based on this an other related questions, I have implemented the following solution, which seems to be quite robust and does not require any changes in all the print statements of my codebase:

# -*- coding: utf-8 -*-

import sys

def set_output_encoding(encoding='utf-8'):
    import sys
    import codecs
    '''When piping to the terminal, python knows the encoding needed, and
       sets it automatically. But when piping to another program (for example,
       | less), python can not check the output encoding. In that case, it 
       is None. What I am doing here is to catch this situation for both 
       stdout and stderr and force the encoding'''
    current = sys.stdout.encoding
    if current is None :
        sys.stdout = codecs.getwriter(encoding)(sys.stdout)
    current = sys.stderr.encoding
    if current is None :
        sys.stderr = codecs.getwriter(encoding)(sys.stderr)

GREEK = u'ΑΒΓΔ ΕΖΗΘ ΙΚΛΜ ΝΞΟΠ ΡΣΤΥ ΦΧΨΩ αβγδ εζηθ ικλμ νξοπ ρςτυ φχψω'

set_output_encoding()

print GREEK
print >> sys.stderr, GREEK

To test this:

python ddd.py             # Do not pipe anything
python ddd.py | less      # Pipe stdout, let stderr go to the terminal
python ddd.py 2>&1 | less # Pipe both stdout and stderr to less

All of them produce the expected:

ΑΒΓΔ ΕΖΗΘ ΙΚΛΜ ΝΞΟΠ ΡΣΤΥ ΦΧΨΩ αβγδ εζηθ ικλμ νξοπ ρςτυ φχψω
ΑΒΓΔ ΕΖΗΘ ΙΚΛΜ ΝΞΟΠ ΡΣΤΥ ΦΧΨΩ αβγδ εζηθ ικλμ νξοπ ρςτυ φχψω
blueFast
  • 41,341
  • 63
  • 198
  • 344
0

Your output program's encoding doesn't support the characters. An alternative is to always encode anything that goes out of your program, and decode it back when you need it.

# -*- coding: utf-8 -*-

GREEK = u'ΑΒΓΔ ΕΖΗΘ ΙΚΛΜ ΝΞΟΠ ΡΣΤΥ ΦΧΨΩ αβγδ εζηθ ικλμ νξοπ ρςτυ φχψω'

print GREEK.encode('utf-8')

This works, however it will only show the encoded string, not the original one, since your terminal application doesn't use the same encoding.

aIKid
  • 26,968
  • 4
  • 39
  • 65
  • This is an alternative, but it means a lot of changes in my code: now all print statements must be changed :( – blueFast Oct 31 '13 at 07:33
  • I have another question: how does the program I pipe to (`less` in this case) know the encoding of the input stream? – blueFast Oct 31 '13 at 07:34
  • AFAIK, i don't think that feature is available yet. The only method is to make sure everything is encoded properly. – aIKid Oct 31 '13 at 07:41
  • Using search and replace feature of many editors, that shouldn't be a hard problem. – aIKid Oct 31 '13 at 07:43
  • It *can be done*, but it is very error prone. Anyway, thanks for the suggestion. The solution I am trying to implement is less invasive: see my answer below. – blueFast Oct 31 '13 at 07:51