13

I am using a simple python script to get reservation results for my CID : simple.py:

data = {"minorRev":"current minorRev #","cid":"xxx","apiKey":"xxx","customerIpAddress":"  ","creationDateStart":"03/31/2013","}

url = 'http://someservice/services/rs/'                      
req = requests.get(url,params=data)                        
print req                                                                 
print req.text                                                                
print req.status_code

Now on the command prompt if I do python simple.py it runs perfectly and prints the req.text variable

However when I try to do

python simple.py | grep pattern

I get

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 1314: ordinal not in range(128)
Nazim Kerimbekov
  • 4,712
  • 8
  • 34
  • 58
Deepankar Bajpeyi
  • 5,661
  • 11
  • 44
  • 64
  • See: http://stackoverflow.com/questions/2596714/why-does-python-print-unicode-characters-when-the-default-encoding-is-ascii – Ceasar Apr 01 '13 at 07:01
  • 1
    read through [this](http://stackoverflow.com/questions/1473577/writing-unicode-strings-via-sys-stdout-in-python). Basically, when piping the output, `sys.stdout.encoding==None` – shx2 Apr 01 '13 at 07:02

2 Answers2

22

print needs to encode the string before sending to stdout but when the process is in a pipe, the value of sys.stdout.encoding is None, so print receives an unicode object and then it tries to encode this object using the ascii codec -- if you have non-ASCII characters in this unicode object, an exception will be raised.

You can solve this problem encoding all unicode objects before sending it to the standard output (but you'll need to guess which codec to use). See these examples:

File wrong.py:

# coding: utf-8

print u'Álvaro'

Result:

alvaro@ideas:/tmp
$ python wrong.py 
Álvaro
alvaro@ideas:/tmp
$ python wrong.py | grep a
Traceback (most recent call last):
  File "wrong.py", line 3, in <module>
    print u'Álvaro'
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc1' in position 0: ordinal not in range(128)

File right.py:

# coding: utf-8

print u'Álvaro'.encode('utf-8')
# unicode object encoded == `str` in Python 2

Result:

alvaro@ideas:/tmp
$ python right.py 
Álvaro
alvaro@ideas:/tmp
$ python right.py | grep a
Álvaro
Álvaro Justen
  • 1,943
  • 1
  • 17
  • 17
7

If sys.stdout.isatty() is false (the output is redirected to a file/pipe) then configure PYTHONIOENCODING envvar outside your script. Always print Unicode, don't hardcode the character encoding of your environment inside your script:

$ PYTHONIOENCODING=utf-8 python simple.py | grep pattern
jfs
  • 399,953
  • 195
  • 994
  • 1,670