0

I have created some list with special characters. However when I am printing those lists appeared some erros in codec.

#!/usr/bin/env python
#-*- coding: utf-8 -*-

#My lists
geometriaAproximada = ['Sim', 'Não'];
regime = ['Permanente', 'Permanente com grande variação', 'Temporário', 
                'Temporário com leito permanente', 'Seco'];
tipomassadagua = ['Oceano', 'Baía', 'Enseada', 'Meandro abandonado', 
                            'Lago/Lagoa', 'Represa/Açude', 'Desconhecida'];
vegetacao_nivel_1 = ['Manguezal', 'Restinga', 'Brejo Litoraneo', 'Mussununga', 
                                'Vegetação com influência fluvial e/ou lacustre', 'Compo Rupestre'
                                'Floresta Estacional', 'Cerrado', 'Caatinga', 'Áreas Antropizadas',
                                'Rios, Lagos, Lagoas, e Corpos d''água'];
vegetacao_nivel_2 = ['Arbustiva/Arbórea', 'Apicum', 'Herbáceo-Arbustivo', 'Arbustiva', 'Herbácea', 
                                'Terras baixas', 'Aluvial' 'de Altitude (Submontana ou Montana)', 'Decidual', 'Semidecidual'
                                'Tipo biogeográfico de Cerrado', 'Tipo biogeográfico de Caatinga'];
vegetacao_nivel_3 = ['Estágio secundário inicial de regeneração', 
                                'Estágio secundário médio de regeneração', 
                                'Estágio primário e/ou secundário avançado de regeneração', 
                                'Mata de Cipó', 'Terras baixas', 'de Altitude (Submontana/Montana)', 'Aluvial', 
                                'Florestado (Cerradão)',  'Arborizado (Stricto sensu)', 'Parque (Campo cerrado)', 'Campo Limpo',
                                'Vereda', 'Floresta de galeria',
                                'Florestada', 'Arborizada', 'Parque', 'Gramínio-lenhosa'];

When I printed someone of them:

print regime

['Permanente', 'Permanente com grande varia\xc3\xa7\xc3\xa3o', 'Tempor\xc3\xa1rio', 'Tempor\xc3\xa1rio com leito permanente', 'Seco']

What can I do to correct it?

dogosousa
  • 151
  • 9
  • How about pulling most of the examples out and focus on one short list? – tdelaney Nov 01 '16 at 20:35
  • 1
    Duplicate? http://stackoverflow.com/questions/3597480/how-to-make-python-3-print-utf8#3603160 or for 2.7: http://stackoverflow.com/questions/5203105/printing-a-utf-8-encoded-string – kabanus Nov 01 '16 at 20:35
  • `print` can prints only text. If you give something different then it try to convert it to **unambiguous** string - for example if it is list then it adds `[`, `]`, quota marks (for strings) and use hex codes for native characters to show you what exacly encoding was used - so it is not error but intentional action. If you need correct text you have to convert list to string on your own. – furas Nov 01 '16 at 21:03
  • It can be also other problem - `print` automatically try to convert text to encoding used by console. If console doesn't use UTF-8 then sometimes you can see hex code instead of unicode chars. – furas Nov 01 '16 at 21:08

1 Answers1

0

Unicode was bolted onto python 2 well after the language was cooked. The str object in particular can hold binary encoded data or ascii string data and you just kinda have to know which is which in your program. The unicode type was added later and does what you'd expect - it holds wide text characters capable of expressing the unicode code set.

To keep matters confusing, your console and text editors likely support utf-8 natively so a string holding encoded utf-8 octets may look right when you view it. These two strings look the same, but repr shows us they are different. The first needs decoding to be a python unicode string

>>> s = 'Permanente com grande variação'
>>> u = u'Permanente com grande variação'
>>> print repr(s)
'Permanente com grande varia\xc3\xa7\xc3\xa3o'
>>> print repr(u)
u'Permanente com grande varia\xe7\xe3o'
>>> s_decode = s.decode('utf-8')
>>> print repr(s_decode)
u'Permanente com grande varia\xe7\xe3o'

So, the first part of your problem is solved by writing your strings as unicode to start out with.

u'Permanente com grande variação'

The second problem is that when you print a list, python will print the repr of its members and your strings will still show encoding characters. This doesn't really need fixing - that's just the normal way python prints lists. If you want something tidier, you have to build your own output as in

>>> mylist = [u'Permanente com grande variação', u'Vegetação com influência']
>>> print mylist
[u'Permanente com grande varia\xe7\xe3o', u'Vegeta\xe7\xe3o com influ\xeancia']
>>> print u', '.join(mylist)
Permanente com grande variação, Vegetação com influência

Finally, python 3 has been out for a very long time and non-english speakers should be especially happy that its done a great job working with international character sets.

Python is dead... long live Python!

tdelaney
  • 73,364
  • 6
  • 83
  • 116