String changes it's encoding after been append to the list

Question

Good day to everyone!

Firstly, the name of topic isnt really what's happening. But I couldnt invent anything better! I have a very simple case and I can't figure out why does it acts like it does:

name = 'Balikóné'
seq = []
seq.append(name)

print name
print seq[0]
print seq

Thats the result:

Balikóné
Balikóné
['Balik\xc3\xb3n\xc3\xa9']

I'm using Python 2.7.5. As first line of my code I have

# -*- coding: utf-8 -*-

to let python understand that I do have some 'non valid' ascii chars in my string. Otherwise I get:

Non-ASCII character '\xc3' in file 'my_path', but no encoding declared

Why does it look different when i'm printing the list and an item from the list?

No, the string never changed it's encoding. Your *terminal* interpreted the encoding and displayed codepoints you recognized instead. — Martijn Pieters, Apr 29 '14 at 15:23
When you print a list, you get the `repr` of the items in the list, which for non-ASCII bytes shows them as `\x` escapes. Related: http://stackoverflow.com/questions/17560620/print-a-list-that-contains-chinese-characters-in-python — Wooble, Apr 29 '14 at 15:23
But is there a chance to save this [seq] now a file without loosing non-ASCII? — Desprit, Apr 29 '14 at 15:27
@Desprit: the characters have *already* been encoded (to UTF8), so you can write those byte strings to a file without problems. — Martijn Pieters, Apr 29 '14 at 15:28
@Martijn Pieters: If write the whole [seq] in file I get the same result: ['Balik\xc3\xb3n\xc3\xa9'] The only thing that is working - writing to a file not the whole [seq] but [seq][item] one by one... — Desprit, Apr 29 '14 at 15:30
@Desprit: sure, but that's because you are writing `str(listobj)` to the file, same as what `print` writes to `stdout`. You get yourself a Python string representation of the object. That's a different issue; don't write lists directly to a file. — Martijn Pieters, Apr 29 '14 at 15:31
@Martijn Pieters: So thats the only way to write to a file without loosing non-ascii, right? — Desprit, Apr 29 '14 at 15:32
By the way, in Python 2.7 you should be marking text strings with `u` at the beginning: `name = u'Balikóné'` along with the `# -*- coding` bit. http://bit.ly/unipain is a good thing to watch and/or read. — Wooble, Apr 29 '14 at 15:33
Thank you @Wooble and @ Martijn Pieters for the links, I'll read it to better understand all this encoding stuff. — Desprit, Apr 29 '14 at 15:36
@Desprit: There's plenty of ways. Writing the whole list object to a file doesn't technically lose the non-ASCII non-printable codepoints *either*; they've just been encoded using Python's escape mechanisms. You could still recover the original byte values by using `ast.literal_eval()` for example. Not that it is a good idea, nor very interoperable with other tools. — Martijn Pieters, Apr 29 '14 at 15:36
@ Martijn Pieters: Thanks for advice! I used ast.literal.eval() in many places before but never to recover bytes. — Desprit, Apr 29 '14 at 15:45
@ Martijn Pieters: Yeah, that worked for me. I'll use literal.eval in this case! Thank you!! — Desprit, Apr 29 '14 at 15:49

String changes it's encoding after been append to the list

0 Answers0