0

I have a parsed text what contains HTML versions of different symbols like quotation marks or dashes.

This is how one string looks like:

Introduction &#8211 First page&#8218s content

And I would like to achive this:

Introduction - First page's content

Is there any library or common solution that changes the HTML entities in any string? Or I would need to write a function which replace the html to the proper string?

I already checked these answers, but I would rather need something that works with a simple Python string that contains html entities.

rihekopo
  • 3,241
  • 4
  • 34
  • 63

2 Answers2

1

html module doesn't require anything special from the string. It just works:

>>> import html
>>> html.unescape('Introduction &#8211 First page&#8218s content')
'Introduction – First page‚s content'
pythad
  • 4,241
  • 2
  • 19
  • 41
0

Try

print unicode(x)

or

print x.encode('ascii')

simo
  • 1
  • 2