0

I would like to decode MACCYRILLIC code, for example "%EE%F2_%E4%EE%E1%F0%E0_%E4%EE%E1%F0%E0_%ED%E5_%E8%F9%F3%F2". How can I do it using Python2?

phrase.decode("MACCYRILLIC") has no effect.

nik
  • 265
  • 2
  • 9

1 Answers1

0

urllib — Open arbitrary resources by URL

urllib.unquote(string)

Replace %xx escapes by their single-character equivalent.

Example: unquote('/%7Econnolly/') yields '/~connolly/'.

==> py -2
Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:24:40) [MSC v.1500 64 bit (AMD64)] on win32

Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import urllib
>>> MACCYRILLIC = "%EE%F2_%E4%EE%E1%F0%E0_%E4%EE%E1%F0%E0_%ED%E5_%E8%F9%F3%F2"
>>> print urllib.unquote(MACCYRILLIC).decode('cp1251')
от_добра_добра_не_ищут
>>>

Edit. Another approach (step by step):

==> py -2
Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:24:40) [MSC v.1500 64 bit (AMD64)] on win32

Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib
>>> import codecs
>>> MACCYRILLIC = '%EE%F2_%E4%EE%E1%F0%E0_%E4%EE%E1%F0%E0_%ED%E5_%E8%F9%F3%F2'
>>> #print
... x = urllib.unquote(MACCYRILLIC) #.decode('cp1251')
>>> print repr(x)
'\xee\xf2_\xe4\xee\xe1\xf0\xe0_\xe4\xee\xe1\xf0\xe0_\xed\xe5_\xe8\xf9\xf3\xf2'
>>> y = codecs.decode(x, 'cp1251')
>>> print y
от_добра_добра_не_ищут
>>>

All above would work on the following requirement:

>>> import sys
>>> sys.stdout.encoding
'utf-8'
>>> print sys.stdout.encoding
utf-8
>>>

Unfortunately, the example in http://rextester.com/XAX79891 shows sys.stdout.encoding None (and I don't know a way of changing it to utf-8). Read more in Lennart Regebro's answer to Stdout encoding in python:

A better generic solution under Python 2 is to treat stdout as what it is: An 8-bit interface. And that means that anything you print to to stdout should be 8-bit. You get the error when you are trying to print Unicode data, because print will then try to encode the Unicode data to the encoding of stdout, and if it's None it will assume ASCII, and fail, unless you set PYTHONIOENCODING.

JosefZ
  • 28,460
  • 5
  • 44
  • 83
  • http://rextester.com/XAX79891 it doesn't work. =( I have the same result: `UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)` – nik Jun 16 '17 at 15:10
  • @nik As you can see, the solution works in standard python-2.7 however it would fail in the rextester.com emulator (explained in updated answer). – JosefZ Jun 16 '17 at 17:01