4

I have a textarea which uses CKEditor to generate HTML. I want to ensure that all links the user enters have their target="_blank". I thought I'd need to do two regex checks: one to replace any target="..." to target="_blank", and another to just insert target attribute where target attribute doesn't exist. I'm not making much progress:

// where target attribute doesn't exist, add it
preg_replace("/<a(\s*(?!target)([\w\-])+=([\\"\'])[^\\"\']+\3)*\s*\/?>/", "<a target="_blank"$1>", $input_lines);

This works in this simple case:

<a href="#">one</a> ---> <a target="_blank" href="#">one</a>

It does not work for <a href="#" alt="hello">one</a>, I'm not sure why but then I don't normally do things this challenging with regular expressions.

Also, how would I replace existing target="..." (e.g. target="_parent") with strictly target="_blank"?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Martyn
  • 6,031
  • 12
  • 55
  • 121
  • 4
    An HTML/XML parser would probably be better for this, http://php.net/manual/en/refs.xml.php. – chris85 Jun 04 '15 at 14:19
  • I'm just going to stick this here for one of the best answers of all time on Stack Overflow : http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – CD001 Jun 04 '15 at 14:33

2 Answers2

2

You can safely use PHP DOM with XPATH to set attributes or modify existing ones in all <a> tags like this:

$html = <<<DATA
<a href="somelink.html" target="_blank"><img src="myimage.jpg" alt="alt" title="sometitle" /></a>
<a href="somelink1.php" target="_parent">link_no1</a>
<a href="somelink2.php">link_no2</a>
<a href="someimage.jpg"><img src="image2.png"></a>
DATA;

$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);
$links = $xpath->query('//a');

foreach($links as $link) { 
   $link->setAttribute('target', '_blank');
}

echo $dom->saveHTML();

See IDEONE demo

Output:

<a href="somelink.html" target="_blank"><img src="myimage.jpg" alt="alt" title="sometitle"><a href="somelink1.php" target="_blank">link_no1</a><a href="somelink2.php" target="_blank">link_no2</a><a href="someimage.jpg" target="_blank"><img src="image2.png"></a></a>
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

Slightly different approach.

First remove all target="..." items. Perhaps replace \btarget="[^"]*" with nothing or a single space.

Next add the wanted target="_blank" items. Perhaps replace <a with <a target="_blank".

But beware of these replacing text in unexpected places in the file. As the comments on the question say, it is almost always better to use a proper HTML/XML parser.

AdrianHHH
  • 13,492
  • 16
  • 50
  • 87