15

I have a html string that contains exactly one a-element in it. Example:

   <a href="http://www.test.com" rel="nofollow external">test</a>

In php I have to test if rel contains external and if yes, then modify href and save the string.

I have looked for DOM nodes and objects. But they seem to be too much for only one A-element, as I have to iterate to get html nodes and I am not sure how to test if rel exists and contains external.

$html = new DOMDocument();
$html->loadHtml($txt);
$a = $html->getElementsByTagName('a');
$attr = $a->item(0)->attributes();
...

At this point I am going to get NodeMapList that seems to be overhead. Is there any simplier way for this or should I do it with DOM?

Linda
  • 419
  • 1
  • 4
  • 12
  • When dealing with DOM you have two options: 1) use native DOM parser 2) Use regular expression (which is overhead) – Yang Apr 21 '13 at 01:47
  • Keep going. Use `DOMDocument()` for manipulation – Yang Apr 21 '13 at 01:48
  • Nobody should use the raw DOM methods for manipulation. Consider phpQuery or QueryPath etc. to reduce tedious boilerplate. – mario Apr 21 '13 at 01:48

4 Answers4

13

Is there any simplier way for this or should I do it with DOM?

Do it with DOM.

Here's an example:

<?php
$html = '<a href="http://example.com" rel="nofollow external">test</a>';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//a[contains(concat(' ', normalize-space(@rel), ' '), ' external ')]");
foreach($nodes as $node) {
    $node->setAttribute('href', 'http://example.org');
}
echo $dom->saveHTML();
Community
  • 1
  • 1
uınbɐɥs
  • 7,236
  • 5
  • 26
  • 42
  • 2
    $dom->saveHTML(); This method, as of 5.2.6, will automatically add and tags to the document if they are missing, without asking whether you want them. – Marcin Jaworski Oct 18 '19 at 06:40
  • Some query explanation would be beneficial to researchers. – mickmackusa Dec 18 '19 at 10:18
  • @MarcinJaworski Thanks for the heads up – looks like the fix is to pass some flags to `loadHtml`: https://stackoverflow.com/questions/4879946/how-to-savehtml-of-domdocument-without-html-wrapper – Illya Moskvin Jan 24 '20 at 22:35
2

I kept going to modify with DOM. This is what I get:

$html = new DOMDocument();
$html->loadHtml('<?xml encoding="utf-8" ?>' . $txt);
$nodes = $html->getElementsByTagName('a');
foreach ($nodes as $node) {
    foreach ($node->attributes as $att) {
        if ($att->name == 'rel') {
            if (strpos($att->value, 'external')) {
                $node->setAttribute('href','modified_url_goes_here');
            }
        }
    }
}
$txt = $html->saveHTML();

I did not want to load any other library for just this one string.

Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
Linda
  • 419
  • 1
  • 4
  • 12
1

The best way is to use a HTML parser/DOM, but here's a regex solution:

$html = '<a href="http://www.test.com" rel="nofollow external">test</a><br>
<p> Some text</p>
<a href="http://test.com">test2</a><br>
<a rel="external">test3</a> <-- This won\'t work since there is no href in it.
';

$new = preg_replace_callback('/<a.+?rel\s*=\s*"([^"]*)"[^>]*>/i', function($m){
    if(strpos($m[1], 'external') !== false){
        $m[0] = preg_replace('/href\s*=\s*(("[^"]*")|(\'[^\']*\'))/i', 'href="http://example.com"', $m[0]);
    }
    return $m[0];
}, $html);

echo $new;

Online demo.

HamZa
  • 14,671
  • 11
  • 54
  • 75
0

You could use a regular expression like if it matches /\s+rel\s*=\s*".*external.*"/ then do a regExp replace like /(<a.*href\s*=\s*")([^"]\)("[^>]*>)/\1[your new href here]\3/

Though using a library that can do this kind of stuff for you is much easier (like jquery for javascript)

anthonybell
  • 5,790
  • 7
  • 42
  • 60