1

I am getting the following error when I run the code below when running my python code in Cloud9 IDE using the default version of Python (2.7.6):

import urllib
artistValue = "Sigur Rós"
artistValueUrl = urllib.quote(artistValue)

SyntaxError: Non-ASCII character '\xc3' in file /home/ubuntu/workspace/test.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

I read to adjust to the following code below was a work around.

import urllib
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
artistValue = "Sigur Rós"
artistValueUrl = urllib.quote(artistValue)

When I tried this a red x pop-up error that read:

Module 'sys' has no 'setdefaultencoding' member"

and if I run the code I still get the Syntax Error.

Why is this happening and what should I do?

EDIT: I also tried the following from the selected answer:

import urllib
print urllib.quote(u"Sigur Rós")

When I ran it I received the following error:

SyntaxError: Non-ASCII character '\xc3' in file /home/ubuntu/workspace/test.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

Jason Melo Hall
  • 652
  • 2
  • 11
  • 23
  • 1
    Sorry, `sys.setdefaultencoding('utf-8')` won't work in the Cloud9 IDE; see [the sys docs](https://docs.python.org/2/library/sys.html#sys.setdefaultencoding) for details. And it's not a good idea anyway, see [Dangers of sys.setdefaultencoding('utf-8')](http://stackoverflow.com/q/28657010/4014959) for info. Please post your code (in a code block) & a sample of the data you're trying to read (also in a code block) so we can help you fix your problem. Also mention what Python version you're using, since Python 2 & Python 3 handle Unicode differently. – PM 2Ring Nov 17 '15 at 05:13
  • Thank you for the feedback. I did my best to edit what you recommended above. Do you have any suggestions? – Jason Melo Hall Nov 20 '15 at 05:17

1 Answers1

1

Ok, that's a bit weird. The Python interpreter should give a SyntaxError complaining about the non-ASCII character in your source code if you don't declare an encoding at the start of the script; OTOH, if you have declared an encoding (or Cloud9 does it automatically), then the Python interpreter ought to treat it as a UTF-8 encoded string.

I'm not familiar with Cloud9, so I can't guarantee that this will work, but it ought to. :)

Make your string a Unicode string (by using the u string prefix) and then explicitly encode it to UTF-8:

import urllib

artistValue = u"Sigur Rós"
artistValueUrl = urllib.quote(artistValue.encode('utf-8'))
print artistValueUrl

output

Sigur%20R%C3%B3s

edit

What happens if you run this:

# -*- coding: utf-8 -*-
import urllib
print urllib.quote("Sigur Rós")

The following should work. Of course, this isn't a practical way to enter such strings into your script, I'm just trying to get a handle on what Cloud9 is doing.

import urllib
print urllib.quote("Sigur R\xc3\xb3s")

And I guess you might as well also try this, just so we can see what error message it produces:

import urllib
print urllib.quote(u"Sigur Rós")
PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • @JasonMeloHall: That's bizarre! That error message should only be produced if you pass a Unicode string to `urllib.quote`, it shouldn't happen if you pass a UTF-8 encoded string. I've added a couple of other things to try to my answer. Please paste the error messages into your question, since they're hard to read in comments. – PM 2Ring Nov 21 '15 at 12:25
  • Explicitly writing the unicode character in your second answer was the trick! Thanks a bunch. – Jason Melo Hall Nov 22 '15 at 00:20
  • Do you have any ideas on how I could automate this if it came in from something other than a manual entry? Should I just start another question for that? – Jason Melo Hall Nov 22 '15 at 00:29
  • @JasonMeloHall: Handling external data is a little different to handling literal strings (i.e., strings that are part of the source code of your script). If the data is already encoded in UTF-8 then simply pass it to `urllib.quote()`. Otherwise, you need to know what the encoding is so you can decode the data bytes correctly to Unicode, then encode that Unicode to UTF-8 bytes, and then call `urllib.quote()` on the UTF-8 bytes. – PM 2Ring Nov 22 '15 at 12:47
  • @JasonMeloHall: I'm glad that the 2nd answer (after the edit) worked, but I'm curious about what result / error message the 1st & 3rd answers (after the edit) gave; I expect the 3rd one to give that "UnicodeWarning: Unicode equal comparison failed ..." message. FWIW, I created that string in the 2nd answer by doing `s=u"Sigur Rós";print repr(s.encode('utf8'))` so it's perplexing that UTF-8 encoding isn't working properly for you. – PM 2Ring Nov 22 '15 at 12:49
  • @JasonMeloHall: Maybe you could try my very first answer again, but make sure you tell your editor to save your script with UTF-8 encoding, and include the `# -*- coding: utf-8 -*-` directive at the top of the script. – PM 2Ring Nov 22 '15 at 12:50
  • Thanks, I used this information to figure out to make sure I declare the charset when working with my mySql database. – Jason Melo Hall Nov 22 '15 at 20:43
  • I don't know what I was doing the first time, but I tried your first answer again and it worked. The third answer spits out "SyntaxError: Non-ASCII character '\xc3' in file /home/ubuntu/workspace/test.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details" – Jason Melo Hall Nov 22 '15 at 20:46
  • @JasonMeloHall: Excellent! Thanks for doing those tests - they may be helpful for future readers. The 3rd answer was just to make sure that Cloud9 isn't invisibly declaring UTF-8 encoding for you. BTW, you may find this article helpful: [Pragmatic Unicode](http://nedbatchelder.com/text/unipain.html), which was written by SO veteran Ned Batchelder; I guess I should've mentioned it earlier... – PM 2Ring Nov 23 '15 at 12:59