I have a bunch of html I am parsing and I need to remove certain <a>
tags if they contain certain text. Normally, I'd use Goquery BUT the text I am searching for often falls outside the html tag itself. For instance, this html:
<html><body>
This is the start.
<a href="http://example.com/path">We don't want to match this text.</a>
<a href="http://www.example.com/another/path" style="font-family:Arial, Helvetica, 'sans-serif'; color:#838383;font-size:12px; line-height:14px"></a> match this text.<a href="blah">We also don't want to match this text</a>
</body></html>
I am using this regexp but it is failing and matching the text I don't want to match:
(?is)<a[^>]+href=["'](?P<link>.*?)["']*.?> match this text\.