0

I'm using this code to replace urls in a string

preg_replace('#<a.*?>(.*?)</a>#i', '\1', $text)

How do I do the same thing, but keeping urls that match a certain pattern (ie start with a domain I want to keep)?

Update

Turns out that the urls I want to keep include relative urls, so I now want to eliminate all urls that don't match the given url pattern and are not relative links.

lordmj
  • 47
  • 1
  • 9

1 Answers1

1

You need a negative look-ahead assertion:

preg_replace('#<a(?![^>]+?href="?http://keepthisdomain.com/foo/bar"?).*?>(.*?)</a>#i', '\1', $text);

Edit: If you want to match only relative domains, the logic is the same. Just take out the protocol and domain name:

preg_replace('#<a(?![^>]+?href="?/?path/to/foo/bar"?).*?>(.*?)</a>#i', '\1', $text);

The ? after " and / means that those characters are optional. So, the second example will work for either path/to/foo/bar or /path/to/foo/bar.

elixenide
  • 44,308
  • 16
  • 74
  • 100
  • How do I get this to keep relative urls in addition to the ones matching the keep domain? – lordmj Jan 29 '14 at 22:08
  • Just tweak it; see my edit. Please accept the answer if you found it helpful! – elixenide Jan 29 '14 at 22:13
  • Was also wondering how to replace urls that appear in the text even if they are not hyperlinks. So a free form url would be stripped out as well. Keeping urls that match the keepthisdomain of course. I will be applying the hyperlink strip regex before the free form strip regex – lordmj Jan 31 '14 at 17:08
  • @lordmj The logic here is the same. Just use the first version, without the portions relevant to anchor tags, like so: `preg_replace('#(?!http://keepthisdomain.com/foo/bar)http://[^<>\s]*#i', '\1', $text);` Note: this may require some tweaking, because it would apply to *everything*, not just plain text. So, it might nuke some `img` tags, for example. You would need to modify the "good" URL pattern to preserve anything you want to keep. – elixenide Jan 31 '14 at 17:13
  • Revisitng this again. Now I want to remove the same hyperlinks as before. But if it has the attribute rel="shadowbox[a]" I want to keep the hyperlink. – lordmj Mar 05 '14 at 20:47
  • If you have a follow-up question, please post it as a separate question -- that way, there's only one question per page. :) – elixenide Mar 05 '14 at 20:50
  • Ok follow up question posted: http://stackoverflow.com/questions/22209474/replace-all-urls-in-string-not-matching-url-pattern-in-php – lordmj Mar 05 '14 at 21:28