13
x = ['Some strings.', 1, 2, 3, 'More strings!', 'Fanc\xc3\xbf string!']
y = [i.decode('UTF-8') for i in x]

What's the best way to convert the strings in x to Unicode? Doing a list compression causes an attribute error (AttributeError: 'int' object has no attribute 'decode') because int's don't have a decode method.

I could use a for loop with a try? Or I could do some explicit type checking in the list compression, but is type checking in a dynamic language like Python the right approach?

UPDATE:

I would prefer that the int's remain int's. Although this is not a strict requirement. My ideal output would be [u'Some strings.', 1, 2, 3, u'More strings!', u'Fancÿ string!'].

Buttons840
  • 9,239
  • 15
  • 58
  • 85
  • 4
    What is your desired output? `[u'Some strings', 1, 2, 3, u'More strings!']`, `[u'Some strings', u'1', u'2', u'3', u'More strings!']`, `[u'Some strings', u'More strings!']` ? – Anders Lindahl Mar 05 '12 at 17:35

2 Answers2

11

You could use the unicode function:

>>> x = ['Some strings.', 1, 2, 3, 'More strings!']
>>> y = [unicode(i) for i in x]
>>> y
[u'Some strings.', u'1', u'2', u'3', u'More strings!']

UPDATE: since you specified that you want the integers to remain as-is, I would use this:

>>> y = [unicode(i) if isinstance(i, basestring) else i for i in x]
>>> y
[u'Some strings.', 1, 2, 3, u'More strings!']

Note: as @Boldewyn points out, if you want UTF-8, you should pass the encoding parameter to the unicode function:

unicode(i, encoding='UTF-8')
jterrace
  • 64,866
  • 22
  • 157
  • 202
  • 1
    This only works for ASCII (the `decode()` is there for a purpose). And it converts the numbers to type `unicode`. – Boldewyn Mar 05 '12 at 17:37
  • *Only* if you tell the Python interpreter via `-*- coding -*-` pragmas. And Unicode != UTF-8, sorry. – Boldewyn Mar 05 '12 at 17:39
  • 1
    @Boldewyn you can pass the ``encoding`` parameter to the ``unicode`` function and it does the exact same thing as ``.decode`` – jterrace Mar 05 '12 at 17:41
  • Yes, that's true. I forgot about this. But you should incorporate it in the answer. – Boldewyn Mar 05 '12 at 17:43
  • @jterrace: Very right, it's the same as `.decode()`, even to the extent that it won't work for integers any more. – Sven Marnach Mar 05 '12 at 17:45
11

If you want to keep the integers as they are in the list while just changing the strings to unicode, you can do

x = ['Some strings.', 1, 2, 3, 'More strings!']
y = [i.decode('UTF-8') if isinstance(i, basestring) else i for i in x]

which gets you

[u'Some strings.', 1, 2, 3, u'More strings!']
cjm
  • 3,703
  • 1
  • 16
  • 18
  • You could also do this using a loop and a try/catch block, but I think this is tidier. – cjm Mar 05 '12 at 17:47
  • 2
    The try/catch block would work on objects which have a decode method, but are not instances of basestring. Which preserves a feature of dynamic languages: you don't have to do a lot of type checking and fancy inheritance. – Buttons840 Mar 05 '12 at 18:14
  • Yeah, it's a compromise between brevity and programming using the dynamic philosophy. I'm of the mind that you should generally avoid using try/catch for flow control if you can help it, but both solutions could be appropriate depending on your mindset/situation. – cjm Mar 05 '12 at 20:18
  • How can this be made safe for both Python 2/3? – Guillochon Jan 11 '17 at 19:45
  • There are a few suggestions here: http://stackoverflow.com/questions/11301138/how-to-check-if-variable-is-string-with-python-2-and-3-compatibility – cjm Jan 12 '17 at 00:04