5

I'm looking for a way to convert a variable (which could be an ASCII string, unicode string WITH extra characters like é or £, or a floats or integer) into a unicode string.

variable.encode('utf-8') where variable is an integer results in AttributeError: 'int' object has no attribute 'encode'

str(variable).encode('utf-8') where variable is the string '£' results in UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

Is there an easy way to do what I'm looking for in Python 2.7? Or do I have to check the type of variable and process it differently?

frankjwu
  • 124
  • 1
  • 2
  • 6

2 Answers2

4

Encoding would never result in a unicode object. You decode from bytes to unicode.

As such, you'd convert to str (a byte string) then to unicode by decoding:

str(obj).decode('utf8')

This will still fail for objects that are already unicode values, so you may want to use try..except to catch that case:

try:
    obj = str(obj).decode('utf8')
except UnicodeEncodeError:
    # already unicode
    pass

If you try to encode a byte-string, Python 2 implicitly first decodes to unicode for you, which is why you got your UnicodeDecodeError.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • converting to str(obj) will create the problem with the unicode chars so you can not just use str('some unicode char') – Omid S. Dec 07 '17 at 02:04
  • 1
    @OmidS.: which is why there is a `try...except` case to **catch exactly that issue**. In Python 2, for *bytestrings*, `str('some bytes that encode non-ASCII codepoints')` is just fine. For a `unicode` object, `str(u'unicode string with non-ASCII codepoints')` will in deed fail, but the exception handler is there for exactly that case. – Martijn Pieters Dec 07 '17 at 08:05
-1

this is an old post but I had the exact same problem :/ I ended up using the unicode function. this is a builtin function you can read about it here

so the only change is instead of str(theThing) you can use unicode(theThing) , as said in documentation it does behave like str except it converts to a unicode string not an ascii one.

just as a word of caution if you are using some kind of file writing or some other stuff you might run into problems there as well or at least I did :D and this post fixed mine

Omid S.
  • 731
  • 7
  • 15
  • This does the wrong thing for the exact example the OP has: A bytestring with non-ASCII bytes, like `’£’`. – Martijn Pieters Dec 07 '17 at 06:51
  • If you *already* have a Unicode string, you’d have to test for that; since that’s the only exception it’s easier to use `str(...).decode(...)` for everything else. – Martijn Pieters Dec 07 '17 at 06:53
  • well I'm not much of python guy but the documentation is pretty clear if you take a look ( the "here" link, first paragraph ), at least in python 2.7 that exact function is there to serve that purpose. – Omid S. Dec 07 '17 at 15:59
  • The problem arises when you pass in something that contains non-ASCII bytes, which will fail decoding. – Martijn Pieters Dec 07 '17 at 16:35