Decoding text in python

Question

I want to know how to decode certain text, and have found some text like this which I want to decode:

\xe2\x80\x93

I know that printing it will solve it, but I am building a web crawler hence I need to build an index (dictionary) containing words with a list of URLs where the word appears.

Hence I want to do something like this:

dic = {}
dic['\xe2\x80\x93'] = 'http://example.com' #this is the url where the word appears

... but when I do:

print dic

I get:

'\xe2\x80\x93'

... instead of â€“.

But when I do print dic['\xe2\x80\x93'] I successfully get â€“.

Howe can I get â€“ by print dic also?

score 0 · Answer 1 · edited May 23 '17 at 12:28

0

When you see \xhh, that is a a character escape sequence. In this case, it is showing you the hex value of the character (see: lexical analysis: string-literals).

The reason you see \xhh sometimes, and you see the actual characters when you use print is related to the difference between __str__ and __repr__ in Python.

edited May 23 '17 at 12:28

Community

1
1

answered Apr 14 '13 at 11:05

Wesley Baugh

3,720
4
24
42

Decoding text in python

1 Answers1