I need a reg expression which will find the target word or words in html (so in amongst tags) but NOT in an anchor or script tag. I have experimented for ages and came up with this
(?!<(script|a).*?>)(\btype 2 diabetes\b)(?!<\/(a|script)>)
assuming in this case the target to replace is type 2 diabetes
I though that this would be common question but all the references are to parts of an anchor, not to being not in an anchor or script tag at all but in amongst them and other tags
This is a test piece of data I have used both http://regexpal.com/ and http://gskinner.com/RegExr/ with the above expression and below test data, try as I might I just cannot exclude the bit in the anchors or script tags without excluding the bit between sets of anchors or script tags.
In the test data below only "type 2 diabetes" inside the
<p></p>
should be caught.
<a href="https://www.testsite.org.uk">
<div><img alt="logo" src="/images/logo.png" height="115" width="200" /></div>
<h2>Healthy Living for People with type 2 Diabetes</h2>
</a>
<p>type 2 Diabetes</p>
<a id="logo" href="https://www.help-diabetes.org.uk">
<div><img alt="logo" src="/images/logo.png" height="115" width="200" /></div>
<h2>Healthy Living for People with type 2 Diabetes</h2>
</a>