I am using RoboBrowser (which uses BeautifulSoup) to extract links from a website, some of these links contain unicode characters. However I am having trouble getting python to interpret it correctly.
For example, a link contains this Cyrillic character
п
Which is URL encoded as
%D0%BF
Beautiful soup will spit out
u'\xd0\xbf'
Which looks correct to me but prints out
п
which corresponds to the byte array
'c3 90 c2 bf'
The correct encoding appears to be
u'\u043f'
Which gives the correct byte array and also prints correctly
u'\u043f'.encode("utf-8").encode("hex")
'd0bf'
I'm guessing I'm doing something wrong so the question is how do I get from
u'\xd0\xbf' to u'\u043f'