How to strip unicode in a list

Question

I want to strip unicode string from the list for example airports [u'KATL',u'KCID']

expected output

[KATL,KCID]

Followed the below link

Tried one of the solution

my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']

map(str.strip, my_list) ['this', 'is', 'a', 'list', 'of', 'words']

got the following error

TypeError: descriptor 'strip' requires a 'str' object but received a 'unicode'

Am I understanding correctly that you want to remove the `u` from your strings?! You're barking up the wrong tree then, it's not something you need to remove, it's only indicating that you're dealing with `unicode` strings, not byte strings. That is not a problem you need to solve. — deceze, Jul 27 '17 at 14:21

randomir · Accepted Answer · 2017-07-27T14:21:08.967

12

First, I strongly suggest you switch to Python 3, which treats Unicode strings as first-class citizens (all strings are Unicode strings, but they are called str).

But if you have to make it work in Python 2, you can strip unicode strings with unicode.strip (if your strings are true Unicode strings):

>>> lst = [u'KATL\n', u'KCID\n']
>>> map(unicode.strip, lst)
[u'KATL', u'KCID']

If your unicode strings are limited to ASCII subset, you can convert them to str with:

>>> lst = [u'KATL', u'KCID']
>>> map(str, lst)
['KATL', 'KCID']

Note that this conversion will fail for non-ASCII strings. To encode Unicode codepoints as a str (string of bytes), you have to choose your encoding algorithm (usually UTF-8) and use .encode() method on your strings:

>>> lst = [u'KATL', u'KCID']
>>> map(lambda x: x.encode('utf-8'), lst)
['KATL', 'KCID']

edited Jul 27 '17 at 14:21

answered Jul 27 '17 at 14:08

randomir

17,989
1
40
55

still I am not able to convert 124 import pdb 125 pdb.set_trace() 126 # Strip all the elements of a string list 127 map(unicode.strip, airports) output is (Pdb++) pp airports [u'KATL'] – Hariom Singh Jul 27 '17 at 14:15
I don't see any error message but I just see that list has still the same unicode – Hariom Singh Jul 27 '17 at 14:18
You actually want to convert unicodes to strings, try the second example from the answer, `map(str, lst)`. – randomir Jul 27 '17 at 14:18

score 3 · Answer 2 · answered Jul 27 '17 at 14:29

The only reliable to convert a unicode string to a byte string is to encode it into an acceptable encoding (ascii, Latin1 and UTF8 are most common one). By definition, UTF8 is able to encode any unicode character, but you will find non ascii chars in the string, and the size in byte will no longer be the number of (unicode) characters. Latin1 is able to represent most of west european languages characters in with a 1 byte per character relation, and ascii is the set of characters that are always correctly represented.

If you want to be able to process strings containing characters not representable in the choosen charset, you can use the parameter errors='ignore' to just remove them or errors='replace' to replace them with a replacement character, often ?.

So if I have correctly understood your requirement, you could translate the list of unicode string into a list of byte strings with:

[ x.encode('ascii', errors='replace') for x in my_list ]

score 2 · Answer 3 · answered Jul 27 '17 at 14:23

A listcomp seems the simplest solution:

[s.strip() for s in my_list]

If you're keen to use a map, I'd use a lambda to get the object's own personal strip function rather than demanding that it be the strip that's delivered by one particular library.

map(lambda s: s.strip(), my_list)

How to strip unicode in a list

3 Answers3