8

In Python, if I have a string like:

a =" Hello - to - everybody"

And I do

a.split('-')

then I get

[u'Hello', u'to', u'everybody']

This is just an example.

How can I get a simple list without that annoying u'??

Aditya
  • 369
  • 6
  • 20
SagittariusA
  • 5,289
  • 15
  • 73
  • 127
  • 16
    First [understand what the `u''` is](http://docs.python.org/2/howto/unicode.html#unicode-literals-in-python-source-code) – Michael Berkowski Feb 02 '13 at 17:01
  • here shows how to convert: http://stackoverflow.com/questions/1207457/convert-unicode-to-string-in-python-containing-extra-symbols – fecub Feb 02 '13 at 17:05
  • 2
    Is this your real code? You split a string, and the delimiter is also a string, then the result should be a list of strings, not a list of unicodes. – nymk Feb 02 '13 at 17:19
  • 1
    @nymk I imagine that the asker is using Django, which tends to make everything Unicode wherever possible due to it's strong support for different character sets, and they have incorrectly simplified the question down. – Gareth Latty Feb 02 '13 at 18:39

2 Answers2

21

The u means that it's a unicode string - your original string must also have been a unicode string. Generally it's a good idea to keep strings Unicode as trying to convert to normal strings could potentially fail due to characters with no equivalent.

The u is purely used to let you know it's a unicode string in the representation - it will not affect the string itself.

In general, unicode strings work exactly as normal strings, so there should be no issue with leaving them as unicode strings.

In Python 3.x, unicode strings are the default, and don't have the u prepended (instead, bytes (the equivalent to old strings) are prepended with b).

If you really, really need to convert to a normal string (rarely the case, but potentially an issue if you are using an extension library that doesn't support unicode strings, for example), take a look at unicode.encode() and unicode.decode(). You can either do this before the split, or after the split using a list comprehension.

Gareth Latty
  • 86,389
  • 17
  • 178
  • 183
1

I have a opposite problem. The str '第一回\u3000甄士隐梦幻识通灵 贾雨村风尘怀闺秀' needs to be splitted by the unicode character. But I made wrong and code split('\u') that leaded to the unicode syntax error.

I should code split('\u3000')

Youth overturn
  • 341
  • 5
  • 7