0

Good day to everyone!

Firstly, the name of topic isnt really what's happening. But I couldnt invent anything better! I have a very simple case and I can't figure out why does it acts like it does:

name = 'Balikóné'
seq = []
seq.append(name)

print name
print seq[0]
print seq

Thats the result:

Balikóné
Balikóné
['Balik\xc3\xb3n\xc3\xa9']

I'm using Python 2.7.5. As first line of my code I have

# -*- coding: utf-8 -*-

to let python understand that I do have some 'non valid' ascii chars in my string. Otherwise I get:

Non-ASCII character '\xc3' in file 'my_path', but no encoding declared

Why does it look different when i'm printing the list and an item from the list?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Desprit
  • 707
  • 2
  • 11
  • 24
  • No, the string never changed it's encoding. Your *terminal* interpreted the encoding and displayed codepoints you recognized instead. – Martijn Pieters Apr 29 '14 at 15:23
  • When you print a list, you get the `repr` of the items in the list, which for non-ASCII bytes shows them as `\x` escapes. Related: http://stackoverflow.com/questions/17560620/print-a-list-that-contains-chinese-characters-in-python – Wooble Apr 29 '14 at 15:23
  • But is there a chance to save this [seq] now a file without loosing non-ASCII? – Desprit Apr 29 '14 at 15:27
  • @Desprit: the characters have *already* been encoded (to UTF8), so you can write those byte strings to a file without problems. – Martijn Pieters Apr 29 '14 at 15:28
  • @Martijn Pieters: If write the whole [seq] in file I get the same result: ['Balik\xc3\xb3n\xc3\xa9'] The only thing that is working - writing to a file not the whole [seq] but [seq][item] one by one... – Desprit Apr 29 '14 at 15:30
  • 1
    @Desprit: sure, but that's because you are writing `str(listobj)` to the file, same as what `print` writes to `stdout`. You get yourself a Python string representation of the object. That's a different issue; don't write lists directly to a file. – Martijn Pieters Apr 29 '14 at 15:31
  • @Martijn Pieters: So thats the only way to write to a file without loosing non-ascii, right? – Desprit Apr 29 '14 at 15:32
  • By the way, in Python 2.7 you should be marking text strings with `u` at the beginning: `name = u'Balikóné'` along with the `# -*- coding` bit. http://bit.ly/unipain is a good thing to watch and/or read. – Wooble Apr 29 '14 at 15:33
  • Thank you @Wooble and @ Martijn Pieters for the links, I'll read it to better understand all this encoding stuff. – Desprit Apr 29 '14 at 15:36
  • @Desprit: There's plenty of ways. Writing the whole list object to a file doesn't technically lose the non-ASCII non-printable codepoints *either*; they've just been encoded using Python's escape mechanisms. You could still recover the original byte values by using `ast.literal_eval()` for example. Not that it is a good idea, nor very interoperable with other tools. – Martijn Pieters Apr 29 '14 at 15:36
  • @ Martijn Pieters: Thanks for advice! I used ast.literal.eval() in many places before but never to recover bytes. – Desprit Apr 29 '14 at 15:45
  • @ Martijn Pieters: Yeah, that worked for me. I'll use literal.eval in this case! Thank you!! – Desprit Apr 29 '14 at 15:49

0 Answers0