Python Unicode: Can I get back my ñ?

Question

I've read several SO questions and blog posts on how python deals with unicode, but I'm still a bit confused. I was scraping through scrapy and got this from a web page: u'Isla de Se\xf1orita'. It should be u'Isla de Señorita'. I know I can do something like..

>>> u"ñ"
u'\xf1'
>>> u"ñ".encode("utf-8")
'\xc3\xb1'

But what am I supposed to do with this? Can I get u"ñ" back out of these bytes? I just want the ñ so that I can save it to a field in a django model. Thanks.

Python is trying to be helpful here and produces ASCII-friendly debug output. ñ is U+00F1 in Unicode, so Python displays `\xf1` to indicate the value in a way that won't break even when copied and pasted into a terminal or editor that cannot handle anything but ASCII. — Martijn Pieters, Dec 05 '14 at 20:58

score 1 · Answer 1 · answered Dec 05 '14 at 20:58

1

your ñ is still there. it's just encoded differently. check out this action in my python interpreter:

>>> print '\xc3\xb1'
ñ

maybe i'm not clear on what you mean by "get it back"?

answered Dec 05 '14 at 20:58

Magenta Nova

691
1
9
15

When I enter that same line of code, I get `├▒`, but if I `print u"ñ"`, I get back `ñ` Why is that? But anyway, I never tried printing it out >_<, so I can see what's going on now. I think my error is probably related to something else. – pyramidface Dec 05 '14 at 21:01
@pyramidface: writing UTF-8 to a terminal or console only works if that terminal or console is actually configured to handle UTF-8. Yours is not. See the post I closed yours as a duplicate of. – Martijn Pieters Dec 05 '14 at 21:03
@MartijnPieters Ah okay, thanks for clearing that up. – pyramidface Dec 05 '14 at 21:05

Python Unicode: Can I get back my ñ?

1 Answers1