1

I'm trying to create a function in PHP that would search in a string for all a href occurences and if title is not set it should replace it with the text value between > text </a> I don't know what is the best way to do it, thinking about something like:

$s = preg_replace('/<  a[^>]*?href=[\'"](.*?)[\'"][^>]*?title=[\'"](.*?)[\'"][^>]*?>(.*?)<\/a>/si','<  a href="$1" title="$2">$3</a>',$s);

How can I check in the regex to see if $2 is set and if it isn't replace it with $3, also $3 can be something like img src="..." alt="..." and in this case I would like to get the value of alt.

First of all I would like to know if this can be done in PHP and how, but any help would be apreciated.

webbiedave
  • 48,414
  • 8
  • 88
  • 101
Emanuel O.
  • 11
  • 1
  • possible duplicate of [Regex to Parse Hyperlinks and Descriptions](http://stackoverflow.com/questions/26323/regex-to-parse-hyperlinks-and-descriptions) – jeroen Apr 18 '11 at 21:42
  • 2
    [Obligatory...](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Justin Morgan - On strike Apr 18 '11 at 21:52

2 Answers2

0

Maybe presume it is not going to be set and look for title='' only:

$preg_replace("/<a[^>]*?href=[\'\"](.*?)[\'\"][^>]*?title=''>(.*?)<\/a>/i","<a href='$1' title='$2'>$2</a>","<a href='http://google.com' title=''>Google</a>");

Output:

<a href='http://google.com' title='Google'>Google</a>

Good luck.

EDIT

Sorry, not too sure what you mean by:

also $3 can be something like img src="..." alt="..." and in this case I would like to get the value of alt.

Isn't $3 in your example the link text?

Kit
  • 4,095
  • 7
  • 39
  • 62
0

The uninformative link is somehwat fitting here. That's not easily doable with regexpressions. You for example cannot use a (?!\4) negative assertion with forward backreference to compare the title= against the <img alt= attribute (which adds enough difficult for extraction already).

At the very least you will have to use preg_replace_callback and handle the replacement in a separate function. There it's easier to break out the attributes and compare alt= against title=.

If you aren't using this for output rewriting, then make the task simpler by not using regexpressions. This is performance-wise not the better choice, but easy to do with e.g. phpQuery or QueryPath:

$qp = qp($html);
foreach ($qp->find("a") as $a) {
    $title = $a->attr("title");
    $alt = $a->find("img")->attr("$title");
    if (!$title) { $a->attr("title", $alt); }
}
$html = $qp->top()->writeHtml();

(The same can be done, only with more elaborate code, using DOMDocument...)

mario
  • 144,265
  • 20
  • 237
  • 291