0

I want to know how to decode certain text, and have found some text like this which I want to decode:

\xe2\x80\x93

I know that printing it will solve it, but I am building a web crawler hence I need to build an index (dictionary) containing words with a list of URLs where the word appears.

Hence I want to do something like this:

dic = {}
dic['\xe2\x80\x93'] = 'http://example.com' #this is the url where the word appears

... but when I do:

print dic

I get:

'\xe2\x80\x93'

... instead of –.

But when I do print dic['\xe2\x80\x93'] I successfully get –.

Howe can I get – by print dic also?

Kara
  • 6,115
  • 16
  • 50
  • 57
user2243116
  • 101
  • 1
  • 3
  • 10

1 Answers1

0

When you see \xhh, that is a a character escape sequence. In this case, it is showing you the hex value of the character (see: lexical analysis: string-literals).

The reason you see \xhh sometimes, and you see the actual characters when you use print is related to the difference between __str__ and __repr__ in Python.

Community
  • 1
  • 1
Wesley Baugh
  • 3,720
  • 4
  • 24
  • 42