12

I am writing a .py file that contains strings from multiple charactersets, including English, Spanish, and Russian. For example, I have something like:

string_en = "The quick brown fox jumped over the lazy dog."  
string_es = "El veloz murciélago hindú comía feliz cardillo y kiwi."
string_ru = "В чащах юга жил бы цитрус? Да, но фальшивый экземпляр!"

I am having trouble figuring out how to encode my file to avoid generating syntax errors like the one below when my file is run:

SyntaxError: Non-ASCII character '\xc3' in file example.py on line 128, but no encoding
declared; see http://www.python.org/peps/pep-0263.html for details

I've tried adding # -*- coding: utf-8 -*- to the beginning of my file, but without any luck. I've also tried marking my strings as unicode (i.e. string_en = u'The quick brown fox jumped over the lazy dog."), again unsuccessfully.

Is it possible to include characters from different Python codecs in one file, or am I attempting to do something that is not allowed?

Katrina
  • 409
  • 5
  • 16
  • 2
    "Multiple encodings" is just a batshit insane idea. If you mean to use unicode, well, there should be no problem. Just make sure it's acutally a unicode file. What errors do you get when you add the encoding declaration and use unicode strings? –  Feb 14 '11 at 17:33
  • This particular error _seems_ to indicate that your file is UTF-8 encoded, given the presence of the `\xc3` byte. I just tried and got the same error. Adding `# coding: utf-8` **on the second line** of my script fixed it. – Eric Redon Feb 14 '11 at 17:43
  • Thanks all for the suggestions. Not sure what I was doing wrong yesterday, but including either `# coding: utf-8` or `# -*- coding: utf-8 -*-` is working fine for me today. FYI, I am using GNU Emacs 22.1. My default encoding system is mule-utf-8 [found using buffer command `C-h C coding`]. – Katrina Feb 15 '11 at 17:27

2 Answers2

13

There are two aspects to proper encoding of strings in your use case:

  1. For Python to understand that you are using UTF-8 encoding, you must include in the first or second line of your code, a line that looks like # coding=utf-8. See PEP 0263 for details.

  2. Your editor also must use UTF-8. This requires to configure it, and depends on the editor you are using. Configuration of Emacs and Vim are addressed in the same PEP, Eclipse can default to the filesystem encoding, which itself can be derived from your locale settings, etc.

Eric Redon
  • 1,712
  • 12
  • 21
1

You have to add # -*- coding: XXXX -*- in the beginning of file, replacing the XXXX with the encoding your editor uses to save your source file;

Which editor are you using? Can you check on the editor settings which encoding is being used to save the data?

nosklo
  • 217,122
  • 57
  • 293
  • 297