-1

I have a string, say

s1 = '<h3><a id="_a50ezru0fkt"></a>Overview</h3><p>is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry\'s standard dummy text</p><h3><a id="_a50ezpu0fgr"></a><p>is simply dummy text of the pr</p>'

I need to remove all the empty <a> tag from s1 There can be several such empty <a> tag in the string where id can take any value.

So that final result is:

s1 = '<h3>Overview</h3><p>is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry\'s standard dummy text</p><h3><p>is simply dummy text of the pr</p>'

I would like to achieve this using regular expression, please.

gm-123
  • 248
  • 3
  • 16
  • you hit an old classic, for your enlightment and fun, read the three top answers to this question https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/ – PA. Apr 26 '21 at 12:26
  • Another "classic" is this "please write code for me" trope. We are more than happy to help you solve a _specific_ programming problem, but you need to exhibit some effort, starting with searching before asking. – tripleee Apr 26 '21 at 12:29

1 Answers1

2

Does something like this help?


import re


s1 = '<h3><a id="_a50ezru0fkt"></a>Overview</h3><p>is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry\'s standard dummy text</p><h3><a id="_a50ezpu0fgr"></a><p>is simply dummy text of the pr</p>'

s2 = re.sub(r'(<a.*?></a>)', '', s1)

print(s2)


'''<h3>Overview</h3><p>is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text</p><h3><p>is simply dummy text of the pr</p>'''

KJDII
  • 851
  • 4
  • 11