3

I have a python script that's running periodically on Heroku using their Scheduler add-on. It prints some debug info, but when there's a non-ASCII character in the text, I get an error in the logs like:

SyntaxError: Non-ASCII character '\xc2' in file send-tweet.py on line 40, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

That's when I have a line like this in the script:

print u"Unicode test: £ ’ …"

I'm not sure what to do about this. If I have this in the script:

import locale
print u"Encoding: %s" % locale.getdefaultlocale()[1]

then this is output in the logs:

Encoding: UTF-8

So, why is it trying, and failing, to output other text in ASCII?

UPDATE: FWIW, here's the actual script I'm using. The debugging output's in line 38-39.

Phil Gyford
  • 13,432
  • 14
  • 81
  • 143
  • 1
    What happens if you define the source code encoding as described here? http://www.python.org/dev/peps/pep-0263/ – Ci3 Feb 15 '13 at 16:21
  • Like Chris Harris said, how about using a `# coding=utf8` (or perhaps `utf-8`?) at the top of your file? – alxbl Feb 15 '13 at 16:22
  • Also, did you have a look at this answer? http://stackoverflow.com/a/6289494/1343005 – Ci3 Feb 15 '13 at 16:23
  • Phil, please amend your sample with code that actually writes to the log. The ``print`` statement above most certainly does not produce the error when you already declared the encoding, but some other statement. And see my note in the answer below about having to use ``.encode('utf-8')`` when writing Unicode characters to a byte-sized stream. – nikola Feb 15 '13 at 17:40
  • I'm not sure what you mean about the `print` statement "does not produce the error". It definitely produced the error for me when printing to Heroku's logs. But using `.encode('utf-8')` got it working, thanks. – Phil Gyford Feb 15 '13 at 17:53
  • Ok, now I get it: in Heroku the ``print()`` statement is being piped to the log, apparently. – nikola Feb 15 '13 at 17:59

1 Answers1

3

As the error says:

no encoding declared

i.e there is no encoding declared in your Python source file.

The linked PEP tells you how to declare an encoding in your Python source: the encoding should be set to the table that your editor/IDE uses when you input the unicode character £ from your example. Most likely UTF-8 is assumed, so at the first line of your send-tweet.py put this:

# coding=utf-8

If the first line already contains a path directive like:

#!/usr/local/bin/python

then put the encoding directive on the second line, e.g.

#!/usr/local/bin/python
# coding=utf-8

Also, when writing Unicode characters in your Python source and declaring UTF-8 encoding, you must use an editor with UTF-8 file saving support, i.e. an editor that can serialize Unicode code points to UTF-8.

In this regard, please note that Unicode and UTF-8 are not the same. Unicode refers to the standard, while UTF-8 is a specific encoding that determines how to serialize Unicode code points into a string that is compatible with ASCII and which uses 1 to 4 bytes to represent the original Unicode string.

So in the Python interpreter a string might be stored as Unicode, but if you want to write a Unicode string as UTF-8 you need to explicitly serialize the string to UTF-8 first, e.g.

s.encode("utf-8")

This is important especially when outputting Unicode strings to byte-sized streams, e.g. when writing to a log file handle which typically assumes byte-sized characters, i.e. UTF-8 for content that contains non-ASCII characters.

nikola
  • 2,241
  • 4
  • 30
  • 42