5
string = "Deepika Padukone, Esha Gupta or Yami Gautam - Who's looks hotter and sexier? Vote! - It's ... Deepika Padukone, Esha Gupta or Yami Gautam…. Deepika Padukone, Esha Gupta or Yami Gautam ... Tag: Deepika Padukone, Esha Gupta, Kalki Koechlin, Rang De Basanti, Soha Ali Khan, Yami  ... Amitabh Bachchan and Deepika Padukone to be seen in Shoojit Sircar's Piku ..."

fp = open("test.txt", "w+");

fp.write("%s" %string);

after running the above code I have got the following error.

File "encode_error.py", line 1

SyntaxError: Non-ASCII character '\xe2' in file encode_error.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
user3770743
  • 59
  • 2
  • 2
  • 7

2 Answers2

6

You have a U+2026 HORIZONTAL ELLIPSIS character in your string definition:

... Deepika Padukone, Esha Gupta or Yami Gautam…. ...
                                               ^

Python requires that you declare the source code encoding if you are to use any non-ASCII characters in your source.

Your options are to:

  • Declare the encoding, as specified in the linked PEP 263. It's is a comment that must be the first or second line of your source file.

    What you set it to depends on your code editor. If you are saving files encoded as UTF-8, then the comment looks something like:

    # coding: utf-8
    

    but the format is flexible. You can spell it encoding too, for example, and use = instead of :.

  • Replace the horizontal ellipsis with three dots, as used in the rest of the string

  • Replace the codepoint with \xhh escape sequences to represent encoded data. U+2026 encoded to UTF-8 is \xe2\x80\xa6.
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • how to do source code encoding in my source. I am newbie to python – user3770743 Jun 24 '14 at 10:32
  • 1
    THe string posted above was taken from a JSON object and I need to extract the string. So it may not be possible to replace the ellipsis with 3 dots. – user3770743 Jun 24 '14 at 10:34
  • @user3770743: Why not load the JSON data from a file or HTTP response with the `json` module then? – Martijn Pieters Jun 24 '14 at 10:35
  • @user3770743: If you're reading it from a JSON object, you'll run into an overlapping but different set of problems than if you try to embed the string into source code. – user2357112 Jun 24 '14 at 10:37
5

add # coding: utf-8 to the top of your file.

# coding: utf-8
string = "Deepika Padukone, Esha Gupta or Yami Gautam - Who's looks hotter and sexier? Vote! - It's ... Deepika Padukone, Esha Gupta or Yami Gautam…. Deepika Padukone$

fp = open("test.txt", "w+");

fp.write("%s" %string);

Explanation:

The error is caused by the replacing standard characters like apostrophe (‘) by non-standard characters like quotation mark (`) during copying. It happens quite often when you copy text from a pdf file. The difference is very subtle, but there is a huge difference as far as Python is concerned. The apostrophe is completely legal to indicate a text string, but the quotation mark is not.

Technically, it’s not exactly illegal to use any kind of characters we want. It’s just that we have to tell Python what kind of encoding we are using so that it knows what to do with these non-standard characters. Adding # coding: utf-8 to the top of that file will tell python that your encoding is utf-8.

UTF-8 is an encoding format to represent the characters in the Unicode set. It is used very widely on the web. Unicode is the industry standard for representing and handling text on many different platforms including the web, enterprise software, printing etc. UTF-8 is one of the more popular ways used for encoding this character set.

griffon vulture
  • 6,594
  • 6
  • 36
  • 57