2

I am pulling in HTML from a database and displaying it on a webpage using PHP. Lets just say that the people who will be putting in the HTML are not very experiences and will probably struggle with just creating things such as links. I need all the links to open in new pages (which I will do by adding target="_blank" inside the <a> tags. Now the question: what is the best way to do this?

As an example the following HTML in the database:

<a href="www.google.com">link</a>

Should be outpus as:

<a href="www.google.com" target="_blank">link</a>

I currently have this line to do what I want:

$text = preg_replace('/(<a.*?)>/', '$1 target="_blank">', $text);

But as I know from this answer and many others on SO, regex and HTML is not advised. Is there a better way? Using a HTML/XML parser, etc. seems over the top for such a simple operation.

Community
  • 1
  • 1
carloabelli
  • 4,289
  • 3
  • 43
  • 70
  • 2
    Never use regexes on html, especially if you're trying to CHANGE the html. Use DOM, and then it's a simple `$node->setAttribute('target', '_blank');` – Marc B Jul 30 '14 at 14:53
  • Is javascript out of the question? – dmgig Jul 30 '14 at 14:54
  • 1
    I am by no means and expert on web security, but storing and retrieving HTML from a database sounds like a bad idea. A user could easily submit malicious JavaScript and other code. I hope you know what you are doing :\ – Jason Jul 30 '14 at 14:54
  • @MarcB Never say never... – giorgio Jul 30 '14 at 14:58
  • @jason: so all of the forums and blogs and CMS systems out there are a bad idea? Databases don't care what you're storing in them. it's just data. What's dangerous is HOW you get that into the database. – Marc B Jul 30 '14 at 14:59
  • @Jason The database is basically a new CMS and is edited by the same people who were writing the actual PHP before. If they really wanted to be malicious they would have already done so :). – carloabelli Jul 30 '14 at 15:01
  • @MarcB Could you create an answer with a more descriptive answer to this problem using DOM? – carloabelli Jul 30 '14 at 15:01
  • @MarcB That was essentially my point. If any user with privileges to submit to the database can write any HTML, what's stopping them from placing some ` – Jason Jul 30 '14 at 15:08
  • @cabellicar123 Ah then you probably don't have anything to worry about. I thought any user would be able to submit code like a forum. – Jason Jul 30 '14 at 15:10

2 Answers2

4

The better way would be doing this in Javascript and letting the client handle it. if you are using jquery it would be as easy as :

$(document).ready(function() {
    $('a').attr('target', '_blank');
});

or in vanilla javascript:

 window.onload = function() {
     var a = document.getElementsByTagName('A');
     for (var i =0, l=a.length; i < l; i++) {
         a[i].setAttribute('target', '_blank');
     }
 }

of if you really need it to be done in PHP:

$html = "YOUR HTML STRING";
$doc = new DOMDocument();
$dom = $doc->loadHTML($html);

foreach ($dom->getElementsByTagName('a') as $link) {
   $link->setAttribute('target', '_blank');
}
$html = $dom->saveHTML();

should work

Jonathan Crowe
  • 5,793
  • 1
  • 18
  • 28
0

But as I know from this answer and many others on SO, regex and HTML is not advised.

Correct

Is there a better way?

Yes. Use an HTML parser.

Using a HTML/XML parser, etc. seems over the top for such a simple operation.

"Adding target="_blank" to some links" sounds like a simple operation, but the real operation is really "Taking some user input, which might be HTML but is probably tag soup and full of errors, and trying to edit it according to some pre-defined rules".

Use an HTML parser. This is what they are designed for.

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335