81

I am having a problem with my encoding in Python. I have tried different methods but I can't seem to find the best way to encode my output to UTF-8.

This is what I am trying to do:

result = unicode(google.searchGoogle(param), "utf-8").encode("utf-8")

searchGoogle returns the first Google result for param.

This is the error I get:

exceptions.TypeError: decoding Unicode is not supported

Does anyone know how I can make Python encode my output in UTF-8 to avoid this error?

simonbs
  • 7,932
  • 13
  • 69
  • 115

1 Answers1

102

Looks like google.searchGoogle(param) already returns unicode:

>>> unicode(u'foo', 'utf-8')

Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    unicode(u'foo', 'utf-8')
TypeError: decoding Unicode is not supported

So what you want is:

result = google.searchGoogle(param).encode("utf-8")

As a side note, your code expects it to return a utf-8 encoded string so what was the point in decoding it (using unicode()) and encoding back (using .encode()) using the same encoding?

yak
  • 8,851
  • 2
  • 29
  • 23
  • 4
    Honestly, the `unicode()` was just fooling around trying to understand what was happening. Thank you very much :-) – simonbs Oct 04 '11 at 06:25
  • 2
    Now I will sometimes get `ascii' codec can't decode byte 0xc3 in position`. Do you know why that is? – simonbs Oct 04 '11 at 09:05
  • 2
    In the line I suggested? Then it would mean that searchGoogle() returned a string with 0xC3 byte. Calling `.encode()` on that results in Python trying to convert to unicode first (using ascii encoding). I don't know why searchGoogle() would sometimes return unicode and sometimes a string. Maybe it depends on what you give it in `param`? Try to stick to one type. – yak Oct 05 '11 at 10:37
  • 66
    I wish there was a safe, simple way to cast to unicode. – Eric Walker Oct 21 '14 at 00:45
  • @EricWalker You could write an awkward helper function like `def uors2u(object, encoding=..., errors=...)` which will return `object` param unchanged if it is already in Unicode or convert it if str. However, this code smells. You should be converting all input to Unicode as soon as you receive it from the outside (like a file system) and converting it back if needed before sending it back. There should be only one place where you convert str to unicode, so a helper function like the one I described should not be needed. – Leonid Dec 13 '17 at 05:34