4

I'm using BeautifulSoup, and I get back a string like this:

u'Dassault Myst\xe8re'

It's a unicode, but what I want is to make it look like:

'Dassault Mystère'

I have tried

name = name.encode('utf-8'), decode(), unicode()

The error I keep getting is:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8'

My default encoding seems to be 'ascii': sys.getdefaultencoding() returns 'ascii' even though I have:

#!/usr/bin/env python
# encoding: utf-8

At the top of the file.

Hoping to solve this recurring Unicode issue once and for all!

Thanks

James
  • 1,689
  • 3
  • 17
  • 21

1 Answers1

1

I do not know how and where you get this message, but look at this exmple:

$ python
Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> txt = u'Dassault Myst\xe8re'
>>> txt
u'Dassault Myst\xe8re'
>>> print txt
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 13:
  ordinal not in range(128)
>>> ^D
$ export LANG=en_US.UTF-8
$ python
Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> txt = u'Dassault Myst\xe8re'
>>> txt
u'Dassault Myst\xe8re'
>>> print txt
Dassault Mystère
>>>^D 

So as you can see if you have a console as ASCII then during print, there is a conversion from unicode to ascii, and if there is character outside ASCII scope - exception is thrown.

But if console can accept unicode, then everything is correctly displayed.

Jerzyk
  • 3,662
  • 23
  • 40
  • Well, that fixed the printing to console issue. But I still have a problem when building a url, because when I append - u'Dassault Myst\xe8re' to url urllib2 chokes on it when making a http request. I guess it's expecting an ascii string, and I'm sending something else? – James Mar 12 '11 at 22:44
  • My url looks like this: u'http://www.youtube.com/results?search_query=Dassault+Myst\xe8re&aq=0' and urllib2 doesn't like that it seems. – James Mar 12 '11 at 22:47
  • 2nd part solved using this answer: http://stackoverflow.com/questions/4389572/how-to-fetch-a-non-ascii-url-with-python-urlopen – James Mar 12 '11 at 22:55
  • I believe urllib2 works as expected with "from __future__ import unicode_literals" . – fiacre Dec 09 '15 at 05:34