1

I have a list of strings that I will be working in PYTHON happens that some strings contain special characters: üäö and so on.

I have 2 solutions:

  1. Treating the acquired data after by replacing the substring in the list of strings.
  2. Decoding what is acquired in the list in python.

    lista_names_d = [ 'L\xc3\xbcneburg Bockelsberg 2', 'L\xc3\xbcneburg Bockelsberg 1', 'L\xc3\xbcneburg Bockelsberg 3','L\xc3\xbcneburg Bockelsberg 5' ]

I tried this

lista_names_d = [name.replace('\xc3\xbc', 'ü') for name in lista_names_d]

This does nothing

I tried this

your_unicode_string = "L\xc3\xbcneburg Kaltenmoor BHKW 1"
correct_unicode_string = your_unicode_string.encode('latin1').decode('utf8')

error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

Any help is highly appreciate

may
  • 1,073
  • 4
  • 14
  • 31

2 Answers2

0

What about using the function unicode? This code prints the proper accents:

lista_names_d = [ 'L\xc3\xbcneburg Bockelsberg 2', 'L\xc3\xbcneburg Bockelsberg 1', 'L\xc3\xbcneburg Bockelsberg 3','L\xc3\xbcneburg Bockelsberg 5' ]

for item in lista_names_d:
    print(unicode(item, 'utf-8'))
Klaymen
  • 75
  • 2
  • 10
  • Gives me the following error:Traceback (most recent call last): File "", line 2, in UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 1: ordinal not in range(128) – may Dec 20 '17 at 10:38
  • 1
    You migth try to add this line as the first line of your code : In source header you can declare:`# -*- coding: utf-8 -*-` – Klaymen Dec 20 '17 at 10:39
  • I added it. Still errors, unfortunately! UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 1: ordinal not in range(128) – may Dec 20 '17 at 10:46
  • 1
    Have you tried the example above as standalone? I'm asking, because there is no \xfc char in that code explicitly. (however \xc3\xbc in utf-8 is \xfc in latin-1) – Klaymen Dec 20 '17 at 11:11
  • yes, I did still get the error containing 'u\xfc'. I also added the header utf-8 -*- reading the documentation to get some clarity of the issue – may Dec 20 '17 at 12:05
  • 1
    What is your python version exactly? – Klaymen Dec 20 '17 at 12:44
  • I use 2.7 exactly @Klaymen – may Dec 20 '17 at 12:56
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/161642/discussion-between-klaymen-and-mayra). – Klaymen Dec 20 '17 at 17:26
0

Check encoding documentation:

for city in lista_names_d:
    print city.decode('utf8')
# Lüneburg Bockelsberg 2
# Lüneburg Bockelsberg 1
# Lüneburg Bockelsberg 3
# Lüneburg Bockelsberg 5

from official documentation:

>>> u = unichr(40960) + u'abcd' + unichr(1972)   # Assemble a string
>>> utf8_version = u.encode('utf-8')             # Encode as UTF-8
>>> type(utf8_version), utf8_version
(<type 'str'>, '\xea\x80\x80abcd\xde\xb4')
>>> u2 = utf8_version.decode('utf-8')            # Decode using UTF-8
>>> u == u2                                      # The two strings match
True
Gsk
  • 2,929
  • 5
  • 22
  • 29
  • does this works for you? I get an error: Traceback (most recent call last): File "", line 2, in UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 1: ordinal not in range(128) – may Dec 20 '17 at 10:39