Convert variable types into unicode string

Question

I'm looking for a way to convert a variable (which could be an ASCII string, unicode string WITH extra characters like é or £, or a floats or integer) into a unicode string.

variable.encode('utf-8') where variable is an integer results in AttributeError: 'int' object has no attribute 'encode'

str(variable).encode('utf-8') where variable is the string '£' results in UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

Is there an easy way to do what I'm looking for in Python 2.7? Or do I have to check the type of variable and process it differently?

Martijn Pieters · Answer 1 · 2016-10-30T22:55:20.587

4

Encoding would never result in a unicode object. You decode from bytes to unicode.

As such, you'd convert to str (a byte string) then to unicode by decoding:

str(obj).decode('utf8')

This will still fail for objects that are already unicode values, so you may want to use try..except to catch that case:

try:
    obj = str(obj).decode('utf8')
except UnicodeEncodeError:
    # already unicode
    pass

If you try to encode a byte-string, Python 2 implicitly first decodes to unicode for you, which is why you got your UnicodeDecodeError.

edited Oct 30 '16 at 22:55

answered Oct 30 '16 at 22:27

Martijn Pieters

1,048,767
296
4,058
3,343

converting to str(obj) will create the problem with the unicode chars so you can not just use str('some unicode char') – Omid S. Dec 07 '17 at 02:04
1

@OmidS.: which is why there is a `try...except` case to **catch exactly that issue**. In Python 2, for *bytestrings*, `str('some bytes that encode non-ASCII codepoints')` is just fine. For a `unicode` object, `str(u'unicode string with non-ASCII codepoints')` will in deed fail, but the exception handler is there for exactly that case. – Martijn Pieters Dec 07 '17 at 08:05

score -1 · Answer 2 · answered Dec 07 '17 at 02:21

-1

this is an old post but I had the exact same problem :/ I ended up using the unicode function. this is a builtin function you can read about it here

so the only change is instead of str(theThing) you can use unicode(theThing) , as said in documentation it does behave like str except it converts to a unicode string not an ascii one.

just as a word of caution if you are using some kind of file writing or some other stuff you might run into problems there as well or at least I did :D and this post fixed mine

answered Dec 07 '17 at 02:21

Omid S.

731
7
15

This does the wrong thing for the exact example the OP has: A bytestring with non-ASCII bytes, like `’£’`. – Martijn Pieters Dec 07 '17 at 06:51
If you *already* have a Unicode string, you’d have to test for that; since that’s the only exception it’s easier to use `str(...).decode(...)` for everything else. – Martijn Pieters Dec 07 '17 at 06:53
well I'm not much of python guy but the documentation is pretty clear if you take a look ( the "here" link, first paragraph ), at least in python 2.7 that exact function is there to serve that purpose. – Omid S. Dec 07 '17 at 15:59
The problem arises when you pass in something that contains non-ASCII bytes, which will fail decoding. – Martijn Pieters Dec 07 '17 at 16:35

Convert variable types into unicode string

2 Answers2