Unicode error when printing from Python to Heroku logs

Question

I have a python script that's running periodically on Heroku using their Scheduler add-on. It prints some debug info, but when there's a non-ASCII character in the text, I get an error in the logs like:

SyntaxError: Non-ASCII character '\xc2' in file send-tweet.py on line 40, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

That's when I have a line like this in the script:

print u"Unicode test: £ ’ …"

I'm not sure what to do about this. If I have this in the script:

import locale
print u"Encoding: %s" % locale.getdefaultlocale()[1]

then this is output in the logs:

Encoding: UTF-8

So, why is it trying, and failing, to output other text in ASCII?

UPDATE: FWIW, here's the actual script I'm using. The debugging output's in line 38-39.

What happens if you define the source code encoding as described here? http://www.python.org/dev/peps/pep-0263/ — Ci3, Feb 15 '13 at 16:21
Like Chris Harris said, how about using a `# coding=utf8` (or perhaps `utf-8`?) at the top of your file? — alxbl, Feb 15 '13 at 16:22
Also, did you have a look at this answer? http://stackoverflow.com/a/6289494/1343005 — Ci3, Feb 15 '13 at 16:23
Phil, please amend your sample with code that actually writes to the log. The ``print`` statement above most certainly does not produce the error when you already declared the encoding, but some other statement. And see my note in the answer below about having to use ``.encode('utf-8')`` when writing Unicode characters to a byte-sized stream. — nikola, Feb 15 '13 at 17:40
I'm not sure what you mean about the `print` statement "does not produce the error". It definitely produced the error for me when printing to Heroku's logs. But using `.encode('utf-8')` got it working, thanks. — Phil Gyford, Feb 15 '13 at 17:53
Ok, now I get it: in Heroku the ``print()`` statement is being piped to the log, apparently. — nikola, Feb 15 '13 at 17:59

nikola · Accepted Answer · 2013-12-20T10:32:00.357

As the error says:

no encoding declared

i.e there is no encoding declared in your Python source file.

The linked PEP tells you how to declare an encoding in your Python source: the encoding should be set to the table that your editor/IDE uses when you input the unicode character £ from your example. Most likely UTF-8 is assumed, so at the first line of your send-tweet.py put this:

# coding=utf-8

If the first line already contains a path directive like:

#!/usr/local/bin/python

then put the encoding directive on the second line, e.g.

#!/usr/local/bin/python
# coding=utf-8

Also, when writing Unicode characters in your Python source and declaring UTF-8 encoding, you must use an editor with UTF-8 file saving support, i.e. an editor that can serialize Unicode code points to UTF-8.

In this regard, please note that Unicode and UTF-8 are not the same. Unicode refers to the standard, while UTF-8 is a specific encoding that determines how to serialize Unicode code points into a string that is compatible with ASCII and which uses 1 to 4 bytes to represent the original Unicode string.

So in the Python interpreter a string might be stored as Unicode, but if you want to write a Unicode string as UTF-8 you need to explicitly serialize the string to UTF-8 first, e.g.

s.encode("utf-8")

This is important especially when outputting Unicode strings to byte-sized streams, e.g. when writing to a log file handle which typically assumes byte-sized characters, i.e. UTF-8 for content that contains non-ASCII characters.

Thanks for this, and the comments above. I've now tried this, and I still get the same error... — Phil Gyford, Feb 15 '13 at 17:09
A Mac. I've just updated the question with a link to the actual script I'm using, in case that helps. — Phil Gyford, Feb 15 '13 at 17:20
Sublime has the option "Save with Encoding" -> "UTF-8". Did you do this? — nikola, Feb 15 '13 at 17:28
I just tested your line, with an encoding declared, on my Mac, saved with Sublime as UTF-8. I got no error, so the line itself is fine. — nikola, Feb 15 '13 at 17:34
Yes I can run it fine on my Mac, but it's when outputting to Heroku logs that I get the error. — Phil Gyford, Feb 15 '13 at 17:36
Aha, your updated answer does the trick - adding .encode('utf-8') finally has it working. Thanks so much! — Phil Gyford, Feb 15 '13 at 17:46

Unicode error when printing from Python to Heroku logs

1 Answers1