Python - Remove apostrophe from Regular Expression

Question

I have the following regular expression to extract song names from a certain website:

<h2 class="chart-row__song">(.*?)</h2>

It displays the results below :

Where ' is in the output below, is an apostrophe on the website the song name is extract from.

How would I go about changing my regular expression to remove those characters? '

TIA

Oops, you forgot to post your code! StackOverflow is about helping people fix their code. It's not a free coding service. Any code is better than no code at all. Meta-code, even, will demonstrate how you're thinking a program should work, even if you don't know how to write it. — ghoti, May 21 '16 at 13:04
@ghoti The line starting with
is the code 'regular expression'... — , May 21 '16 at 13:05
Firstly, [don't parse HTML with regular expressions](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). Secondly, a regex is not suited to this task. Those things are called HTML entities, do a search on them. — Alex Hall, May 21 '16 at 13:05
That doesn't look like python to me. A regular expression is used to MATCH content, not alter it (like removing strings from it). You've tagged your queston [tag:python], so please include your attempt to achieve this in python. — ghoti, May 21 '16 at 13:06
@ghoti it's not really far to expect a beginner to understand what's going on here. They could do a simple string replace and it would work but be a poor solution. — Alex Hall, May 21 '16 at 13:07
@AlexHall I agree; the standard processes for handling poor-quality questions don't really accommodate [XY problems](http://mywiki.wooledge.org/XyProblem) like this. (I.e. "Please help me *use foo* to achieve bar" rather than "Please help me *achieve bar*".) Nevertheless, instructions [are readily available](http://stackoverflow.com/help/how-to-ask), and I feel that we improve the site by encouraging those who use them. — ghoti, May 21 '16 at 13:10

score 1 · Answer 1 · answered May 21 '16 at 13:21

As stated in the comments, you can't do that using a regex alone. You need to unescape the HTML entities present in the match separately.

import re
import html
regex = re.compile(r'<h2 class="chart-row__song">(.*?)</h2>')
result = [html.unescape(s) for s in regex.findall(mystring)]

Python - Remove apostrophe from Regular Expression

is the code 'regular expression'...

1 Answers1