Replacing substrings in a list of strings or decoding

Question

I have a list of strings that I will be working in PYTHON happens that some strings contain special characters: üäö and so on.

I have 2 solutions:

Treating the acquired data after by replacing the substring in the list of strings.
Decoding what is acquired in the list in python.

lista_names_d = [ 'L\xc3\xbcneburg Bockelsberg 2', 'L\xc3\xbcneburg Bockelsberg 1', 'L\xc3\xbcneburg Bockelsberg 3','L\xc3\xbcneburg Bockelsberg 5' ]

I tried this

lista_names_d = [name.replace('\xc3\xbc', 'ü') for name in lista_names_d]

This does nothing

I tried this

your_unicode_string = "L\xc3\xbcneburg Kaltenmoor BHKW 1"
correct_unicode_string = your_unicode_string.encode('latin1').decode('utf8')

error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

Any help is highly appreciate

@StevenBENET I collect the data from sql but process on python and I use zeppelin so I guess python 3 — may, Dec 20 '17 at 10:34

Klaymen · Answer 1 · 2017-12-20T10:46:24.710

0

What about using the function unicode? This code prints the proper accents:

lista_names_d = [ 'L\xc3\xbcneburg Bockelsberg 2', 'L\xc3\xbcneburg Bockelsberg 1', 'L\xc3\xbcneburg Bockelsberg 3','L\xc3\xbcneburg Bockelsberg 5' ]

for item in lista_names_d:
    print(unicode(item, 'utf-8'))

edited Dec 20 '17 at 10:46

answered Dec 20 '17 at 10:35

Klaymen

75
2
10

Gives me the following error:Traceback (most recent call last): File "", line 2, in UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 1: ordinal not in range(128) – may Dec 20 '17 at 10:38
1

You migth try to add this line as the first line of your code : In source header you can declare:`# -*- coding: utf-8 -*-` – Klaymen Dec 20 '17 at 10:39
I added it. Still errors, unfortunately! UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 1: ordinal not in range(128) – may Dec 20 '17 at 10:46
1

Have you tried the example above as standalone? I'm asking, because there is no \xfc char in that code explicitly. (however \xc3\xbc in utf-8 is \xfc in latin-1) – Klaymen Dec 20 '17 at 11:11
yes, I did still get the error containing 'u\xfc'. I also added the header utf-8 -*- reading the documentation to get some clarity of the issue – may Dec 20 '17 at 12:05
1

What is your python version exactly? – Klaymen Dec 20 '17 at 12:44
I use 2.7 exactly @Klaymen – may Dec 20 '17 at 12:56
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/161642/discussion-between-klaymen-and-mayra). – Klaymen Dec 20 '17 at 17:26

Gsk · Answer 2 · 2017-12-20T11:51:56.227

0

Check encoding documentation:

for city in lista_names_d:
    print city.decode('utf8')
# Lüneburg Bockelsberg 2
# Lüneburg Bockelsberg 1
# Lüneburg Bockelsberg 3
# Lüneburg Bockelsberg 5

from official documentation:

>>> u = unichr(40960) + u'abcd' + unichr(1972)   # Assemble a string
>>> utf8_version = u.encode('utf-8')             # Encode as UTF-8
>>> type(utf8_version), utf8_version
(<type 'str'>, '\xea\x80\x80abcd\xde\xb4')
>>> u2 = utf8_version.decode('utf-8')            # Decode using UTF-8
>>> u == u2                                      # The two strings match
True

edited Dec 20 '17 at 11:51

answered Dec 20 '17 at 10:37

Gsk

2,929
5
22
29

does this works for you? I get an error: Traceback (most recent call last): File "", line 2, in UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 1: ordinal not in range(128) – may Dec 20 '17 at 10:39

Replacing substrings in a list of strings or decoding

2 Answers2