I would like to decode MACCYRILLIC code, for example "%EE%F2_%E4%EE%E1%F0%E0_%E4%EE%E1%F0%E0_%ED%E5_%E8%F9%F3%F2". How can I do it using Python2?
phrase.decode("MACCYRILLIC")
has no effect.
I would like to decode MACCYRILLIC code, for example "%EE%F2_%E4%EE%E1%F0%E0_%E4%EE%E1%F0%E0_%ED%E5_%E8%F9%F3%F2". How can I do it using Python2?
phrase.decode("MACCYRILLIC")
has no effect.
urllib — Open arbitrary resources by URL
urllib.unquote(string)
Replace
%xx
escapes by their single-character equivalent.Example:
unquote('/%7Econnolly/')
yields'/~connolly/'
.
==> py -2
Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:24:40) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import urllib
>>> MACCYRILLIC = "%EE%F2_%E4%EE%E1%F0%E0_%E4%EE%E1%F0%E0_%ED%E5_%E8%F9%F3%F2"
>>> print urllib.unquote(MACCYRILLIC).decode('cp1251')
от_добра_добра_не_ищут
>>>
Edit. Another approach (step by step):
==> py -2
Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:24:40) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib
>>> import codecs
>>> MACCYRILLIC = '%EE%F2_%E4%EE%E1%F0%E0_%E4%EE%E1%F0%E0_%ED%E5_%E8%F9%F3%F2'
>>> #print
... x = urllib.unquote(MACCYRILLIC) #.decode('cp1251')
>>> print repr(x)
'\xee\xf2_\xe4\xee\xe1\xf0\xe0_\xe4\xee\xe1\xf0\xe0_\xed\xe5_\xe8\xf9\xf3\xf2'
>>> y = codecs.decode(x, 'cp1251')
>>> print y
от_добра_добра_не_ищут
>>>
All above would work on the following requirement:
>>> import sys
>>> sys.stdout.encoding
'utf-8'
>>> print sys.stdout.encoding
utf-8
>>>
Unfortunately, the example in http://rextester.com/XAX79891 shows sys.stdout.encoding
None (and I don't know a way of changing it to utf-8
). Read more in Lennart Regebro's answer to Stdout encoding in python:
A better generic solution under Python 2 is to treat
stdout
as what it is: An 8-bit interface. And that means that anything you print to tostdout
should be 8-bit. You get the error when you are trying to print Unicode data, because print will then try to encode the Unicode data to the encoding ofstdout
, and if it's None it will assumeASCII
, and fail, unless you setPYTHONIOENCODING
.