Properly unescape html characters

Question

I am trying to grab a movie synopsis, but am having difficulty unescaping some troublesome characters:

import requests
from lxml import html

res = requests.get('https://play.google.com/store/tv/show?id=lXH-sW6govE')
node=html.fromstring(res.content)
synopsis=node.xpath("//div[contains(@class, 'details-section') and contains(@class, 'description')]/meta")[0].attrib['content']

u'&quot;Work Out New York&quot; invites viewers to break a sweat with some of New York City\xe2\x80\x99s hottest personal trainers. They may be friends, but these high-end fitness experts compete against each other to earn the business of wealthy patrons and celebrity clientele. With training techniques and fitness regimens constantly evolving, these trainers better shape up or risk losing their clients to their competitors. Romances, jealousies, and bitter rivalries provide the ultimate test of endurance for these fitness fanatics.'

How would I get the properly-encoded synopsis at https://play.google.com/store/tv/show?id=lXH-sW6govE, that is, ""Work Out New York" invites viewers to break a sweat with some of New York City’s hottest personal trainers. They may be friends, but these high-end fitness experts compete against each other to earn the business of wealthy patrons and celebrity clientele. With training techniques and fitness regimens constantly evolving, these trainers better shape up or risk losing their clients to their competitors. Romances, jealousies, and bitter rivalries provide the ultimate test of endurance for these fitness fanatics.".

score 0 · Accepted Answer · edited May 23 '17 at 12:32

You can use the HTMLParser to unescape this. Something like:

print HTMLParser.HTMLParser().unescape(synopsis)
"Work Out New York" invites viewers to break a sweat with some of New York Cityâs hottest personal trainers. They may be friends, but these high-end fitness experts compete against each other to earn the business of wealthy patrons and celebrity clientele. With training techniques and fitness regimens constantly evolving, these trainers better shape up or risk losing their clients to their competitors. Romances, jealousies, and bitter rivalries provide the ultimate test of endurance for these fitness fanatics.

More details here: How do I unescape HTML entities in a string in Python 3.1?.

Properly unescape html characters

1 Answers1