18

I have a python list that looks like that:

list = [u'a', u'b', u'c']

Now I want to encode it in UTF-8. Therefore I though I should use:

list = list[0].encode("utf-8")

But print list gives only

a

meaning the first element of the list. Not even a list anymore. What am I doing wrong?

Tom
  • 425
  • 3
  • 6
  • 20

4 Answers4

53
>>> items =  [u'a', u'b', u'c']
>>> [x.encode('utf-8') for x in items]
['a', 'b', 'c']
jamylak
  • 128,818
  • 30
  • 231
  • 230
  • 1
    @user2401772 Jamylak is very fast :) – TerryA Jun 06 '13 at 08:33
  • 1
    doesn't work if you consider: `>>> items = [u'ç', u'á', u'í'] >>> ['\xc3\xa7', '\xc3\xa1', '\xc3\xad']` – ePascoal Jul 14 '15 at 16:41
  • Sorry my bad. I mean it if you consider `>>> items = [u'ç', u'á', u'í']` and if you do your suggestion `>>> [x.encode('utf-8') for x in items]` it will occur `>>> ['\xc3\xa7', '\xc3\xa1', '\xc3\xad']` i expected something like ['ç','á','í'].. Do you have any idea how to accomplished this?Or why doesn't work? – ePascoal Jul 14 '15 at 22:57
  • It is working, if you run `print '\xc3\xa7'` it will show you – jamylak Jul 15 '15 at 08:06
  • 1
    ¡Great resource! – William Romero Aug 30 '19 at 04:42
10

list[0] is the first element, not a list. you are reassigning your list var to a new value, the utf-8 encoding of the first element.

Also, don't name your variables list, as it masks the list() function.

njzk2
  • 38,969
  • 7
  • 69
  • 107
  • While the current accepted answer includes code examples that work, this answer responds to OP's question, "What am I doing wrong", and even takes it a step further to warn about `list` usage. – Kurt Jun 24 '21 at 15:50
1

If you are looking for the output as a clean list without the unicodes:

import unicodedata

list1 = [u'a', u'b', u'c']
clean_list1 = [unicodedata.normalize("NFKD", x) for x in list1]
print(clean_list1)

Output:

['a', 'b', 'c']
WGYTA
  • 11
  • 3
0

You need encode your string not decode. You are provided list consists of a unicode sting. To represent unicode string as a string of bytes is known as encoding, use u'...'.encode. Then by using string.split() you can break encoded string down into smaller chunks ( strings )

Paul Brennan
  • 2,638
  • 4
  • 19
  • 26
Niko
  • 1
  • 2