Remove Sub-String of Pattern from a String in Python

Question

I have a string, say

s1 = '<h3><a id="_a50ezru0fkt"></a>Overview</h3><p>is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry\'s standard dummy text</p><h3><a id="_a50ezpu0fgr"></a><p>is simply dummy text of the pr</p>'

I need to remove all the empty <a> tag from s1 There can be several such empty <a> tag in the string where id can take any value.

So that final result is:

s1 = '<h3>Overview</h3><p>is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry\'s standard dummy text</p><h3><p>is simply dummy text of the pr</p>'

I would like to achieve this using regular expression, please.

you hit an old classic, for your enlightment and fun, read the three top answers to this question https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/ — PA., Apr 26 '21 at 12:26
Another "classic" is this "please write code for me" trope. We are more than happy to help you solve a _specific_ programming problem, but you need to exhibit some effort, starting with searching before asking. — tripleee, Apr 26 '21 at 12:29

score 2 · Answer 1 · answered Apr 26 '21 at 12:26

Does something like this help?


import re


s1 = '<h3><a id="_a50ezru0fkt"></a>Overview</h3><p>is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry\'s standard dummy text</p><h3><a id="_a50ezpu0fgr"></a><p>is simply dummy text of the pr</p>'

s2 = re.sub(r'(<a.*?></a>)', '', s1)

print(s2)


'''<h3>Overview</h3><p>is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text</p><h3><p>is simply dummy text of the pr</p>'''

Remove Sub-String of Pattern from a String in Python

1 Answers1