1

I have some text that has HTML hyper-links in it. I want to remove the hyperlinks, but only specific ones.

e.g. I start with this:

This is text <a href="link/to/somewhere">Link to Remove</a> and more text with another link <a href="/link/to/somewhere/else">Keep this link</a>

I want to have:

This is text and more text with another link <a href="/link/to/somewhere/else">Keep this link</a> 

I have this RegEx expression,

<a\s[^>]*>.*?</a>

... but it matches ALL of the links.

What do I need to add to that expression to match only the links with the link-text 'Remove' (for example) in it?

thanks in advance.

Rob
  • 515
  • 1
  • 7
  • 18
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – thecoop Mar 22 '10 at 15:24

3 Answers3

1

You'll probably get a lot of feedback not to use regular expressions on HTML... but if you do decide to use one, try this:

 <a\s[^>]*>.*?Remove.*?</a>

This is where "Remove" lies somewhere in the link text.

Keltex
  • 26,220
  • 11
  • 79
  • 111
  • Thanks, that got it. And if I wanted to match on 'remove' with case insensitivity, what would I wrap that in? (e.g. match on 'Remove' or 'remove' or 'REMOVE' etc...) – Rob Mar 19 '10 at 04:59
  • 2
    @Rob: Pretty sure C# has something like `RegexOptions.IgnoreCase` that you can pass in as another param. – mpen Mar 19 '10 at 05:19
0
$str=~/(.*)<a.*<\/a>([a-z ]+ <a.*<\/a>)/;
print "$1$2";
muruga
  • 2,092
  • 2
  • 20
  • 28
0

(.*?)<a.*[Rr]emove.*?a>(.*)

reconstruct with: $1$2

Paul
  • 171
  • 2
  • 10