How to handle HTML entities in parsed text - Python

Question

I have a parsed text what contains HTML versions of different symbols like quotation marks or dashes.

This is how one string looks like:

Introduction &#8211 First page&#8218s content

And I would like to achive this:

Introduction - First page's content

Is there any library or common solution that changes the HTML entities in any string? Or I would need to write a function which replace the html to the proper string?

I already checked these answers, but I would rather need something that works with a simple Python string that contains html entities.

score 1 · Accepted Answer · answered Jul 09 '17 at 22:13

1

html module doesn't require anything special from the string. It just works:

>>> import html
>>> html.unescape('Introduction &#8211 First page&#8218s content')
'Introduction – First page‚s content'

answered Jul 09 '17 at 22:13

pythad

4,241
2
19
41

score 0 · Answer 2 · answered Jul 09 '17 at 22:16

0

Try

print unicode(x)

or

print x.encode('ascii')

answered Jul 09 '17 at 22:16

simo

1
2

How to handle HTML entities in parsed text - Python

2 Answers2