0
<a href="http://www.example.com/foo/bar/" title="bar">Example</a>

How can I replace "bar" only in the href attribute?

Thank you!

user557108
  • 1,195
  • 3
  • 17
  • 26
  • Do you want to replace all links on the page containing bar? Only that exact link? Can you be a bit more specific about your goal? – Kato Oct 11 '12 at 16:00
  • I want to replace all the links on the page containing bar in the href attribute. – user557108 Oct 11 '12 at 16:02
  • Just [don't do this with regex](http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html) – Junuxx Oct 11 '12 at 16:09
  • 2
    @Junuxx why not? not every utility function in the universe needs a complete HTML parser. While parsing HTML with regex is [certainly not ideal](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454), it's also not the end of the world to replace all hrefs with a certain string using regex; sometimes the right tool is the quick and dirty one and the question is very straightforward: how to do it with a regex; not whether a regex is the best tool – Kato Oct 11 '12 at 16:16

2 Answers2

2

Let me start off by saying, regular expressions are not the right tool. (At least not to find the attributes; for the replacement, probably...)

However, here is a regular expression solution anyway, but I cannot guarantee that it will work on all valid DOMs.

$newStr = preg_replace(
    '/(<a\s+[^>]*href="[^"]*)bar([^"]*"[^>]*>)/',
    '$1newString$2',
    $str);

Some explanation for the regex:

I start with a capturing group that ensures we are inside the href attribute of an a-Tag. The reason we capture this is, that it makes the replacement a bit cleaner. The \s+[^>]* allows for other attributes to come first, but not for the tag to close. The [^"] allows for more content in the href attribute to come first, but not for the attribute to close. Then we have bar. And then we add some more stuff before closing the attribute and then the tag.

The replacement then simply uses captured groups $1 and $2 (which contain everything around bar, which we had to use to make sure it's in the right place) but inserts the new string in between (where bar used to be).

Note, this will especially break if you have attributes containing > before the href-attribute!

Community
  • 1
  • 1
Martin Ender
  • 43,427
  • 11
  • 90
  • 130
  • Nicely done; are attributes containing `>` actually valid? Is it simply good practice or actual standard that they should be escaped? – Kato Oct 11 '12 at 16:52
  • I think they are allowed. However, this made me realise that my regex does not take single-quoted attribute values into account (omitting quotes altogether is not allowed if slashes are used, which is to be expected for `href` values). – Martin Ender Oct 11 '12 at 19:26
0

Something like this will work, but the exact approach you would want to take really depends on whether you want to do this case-insensitive, for a specific link, or for all links in a page (i.e. globally) and the goal of your replace.

$var = preg_replace('@<a href=(["\'])(.*)/bar/["\']@', "<a href=\\1\\2/foo/\\1", '<a href="http://www.example.com/foo/bar/" title="bar">Example</a>');
Kato
  • 40,352
  • 6
  • 119
  • 149
  • updated for your comments, and also because I munged the /g (that's for JavaScript; PHP defaults to global replace) – Kato Oct 11 '12 at 16:13