0

I've read several SO questions and blog posts on how python deals with unicode, but I'm still a bit confused. I was scraping through scrapy and got this from a web page: u'Isla de Se\xf1orita'. It should be u'Isla de Señorita'. I know I can do something like..

>>> u"ñ"
u'\xf1'
>>> u"ñ".encode("utf-8")
'\xc3\xb1'

But what am I supposed to do with this? Can I get u"ñ" back out of these bytes? I just want the ñ so that I can save it to a field in a django model. Thanks.

pyramidface
  • 1,207
  • 2
  • 17
  • 39
  • Python is trying to be helpful here and produces ASCII-friendly debug output. ñ is U+00F1 in Unicode, so Python displays `\xf1` to indicate the value in a way that won't break even when copied and pasted into a terminal or editor that cannot handle anything but ASCII. – Martijn Pieters Dec 05 '14 at 20:58

1 Answers1

1

your ñ is still there. it's just encoded differently. check out this action in my python interpreter:

>>> print '\xc3\xb1'
ñ

maybe i'm not clear on what you mean by "get it back"?

Magenta Nova
  • 691
  • 1
  • 9
  • 15
  • When I enter that same line of code, I get `├▒`, but if I `print u"ñ"`, I get back `ñ` Why is that? But anyway, I never tried printing it out >_<, so I can see what's going on now. I think my error is probably related to something else. – pyramidface Dec 05 '14 at 21:01
  • @pyramidface: writing UTF-8 to a terminal or console only works if that terminal or console is actually configured to handle UTF-8. Yours is not. See the post I closed yours as a duplicate of. – Martijn Pieters Dec 05 '14 at 21:03
  • @MartijnPieters Ah okay, thanks for clearing that up. – pyramidface Dec 05 '14 at 21:05