You are overthinking this. You use a unicode
literal to make your unicode
object, and then your splits
list will contain unicode
objects:
In [4]: def name():
...: text = u'Obras de revisão e recuperação (45453000-7)'
...: splits = text.split(u" (")
...: return splits
...:
In [5]: splits = name()
In [6]: splits
Out[6]: [u'Obras de revis\xe3o e recupera\xe7\xe3o', u'45453000-7)']
When a list
is printed to the screen, the __repr__
of the objects contained in the list
is used. However, if you want the __str__
, just use print
:
In [7]: for piece in splits:
...: print(piece)
...:
Obras de revisão e recuperação
45453000-7)
Note, .encode
returns a byte-string, i.e. a regular, non-unicode
Python 2 str
. Calling str
on it is essentially the identity function, it's already a str
when you encode
it:
In [8]: splits[0].encode('utf8')
Out[8]: 'Obras de revis\xc3\xa3o e recupera\xc3\xa7\xc3\xa3o'
In [9]: str(splits[0].encode('utf8'))
Out[9]: 'Obras de revis\xc3\xa3o e recupera\xc3\xa7\xc3\xa3o'
You should really, really consider using Python 3, which streamlines this. str
in Python 3 corresponds to Python 2 unicode
, and Python 2 str
corresponds to Python 3 bytes
objects.
So, to clarify things, your name
function should work like this:
In [16]: def name():
...: text = u'Obras de revisão e recuperação (45453000-7)'
...: splits = text.split(u" (")
...: return splits[0]
...:
In [17]: print(name())
Obras de revisão e recuperação