0

i need a function which corrects ALL outgoing links within a given HTML-Text and adds the attribute "rel=nofollow" to the link. Only outgoing links should be corrected.

Example: My domain is www.laptops.com

$myDomain = "www.laptops.com";

$html = 
 "Hello World have a look at <a href="www.laptops.com/apple">Apple Laptops</a>. 
  For more ino go to <a href="www.apple.com">Apple.com</a> 
  or to <a href="www.appleblog.com">Appleblog.com</a>";

function correct($html,$myDomain){ 
    //get all links by filtering '<a ... href="{$link}" .....>' and 
    //check with isOutgoing($href,$myDomain )
}

$newHTML = correct($html,$myDomain);

echo $newHTML;

//Hello World have a look at <a href="www.laptops.com/apple">Apple Laptops</a>. 
//For more ino go to <a rel="nofollow" href="www.apple.com">Apple.com</a> 
//or to <a rel="nofollow" href="www.appleblog.com">Appleblog.com</a> 

So far i have a function "isOutgoing($link)", which can detect, if a link is outgoing or not, but the detection of ALL "< a ... href="{$link}" ..... > " parts of the HTML-Text and filtering the {$link} makes problems. I know that it should be possible with preg_match(), but i have no idea how to solve it.

jb7AN
  • 77
  • 1
  • 6
  • 1
    *"I know that it should be possible with preg_match()"* .... [don't do it](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) ... don't even think it, he'll hear you! – CD001 Jul 13 '18 at 13:37
  • does it need to be PHP? It's insanely easy in jQuery – delboy1978uk Jul 13 '18 at 13:54

2 Answers2

2

You should avoid using regex, instead, you should use DOMDocument and DOMXPath.

<?php
$dom = new DOMDocument();

$dom->loadHtml('
Hello World have a look at <a href="www.laptops.com/apple">Apple Laptops</a>. 
  For more ino go to <a href="www.apple.com">Apple.com</a> 
  or to <a href="www.appleblog.com">Appleblog.com</a>
', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);

foreach ($xpath->query("//a") as $link) {
    $href = $link->getAttribute('href');

    // link does not have a www.laptops.com in it, add rel attribute
    if (strpos($href, 'www.laptops.com') === false) {
        $link->setAttribute("rel", "nofollow noopener");
    }
}

echo $dom->saveHTML();

Result:

<p>Hello World have a look at <a href="www.laptops.com/apple">Apple Laptops</a>. 
  For more ino go to <a href="www.apple.com" rel="nofollow noopener">Apple.com</a> 
  or to <a href="www.appleblog.com" rel="nofollow noopener">Appleblog.com</a>
</p>

https://3v4l.org/DseDi

Lawrence Cherone
  • 46,049
  • 7
  • 62
  • 106
  • This. If it HAS to be in PHP! DOM classes rock! – delboy1978uk Jul 13 '18 at 13:58
  • Thank you, worked for me. Needed to add some code for UTF-8 and Error Suppressing: $sHTML = mb_convert_encoding($sRawValue, 'HTML-ENTITIES', 'UTF-8'); $libxml_previous_state = libxml_use_internal_errors(true); $oDom->loadHtml($sHTML, LIBXML_HTML_NODEFDTD | LIBXML_NOERROR | LIBXML_NOWARNING); // handle errors libxml_clear_errors(); // restore libxml_use_internal_errors($libxml_previous_state); – jb7AN Jul 18 '18 at 10:12
0

This would would be so much easier with a bit of jQuery.

<script type="text/javascript">
$(document).ready(function(){
    $('a').each(function(){
        let href = $(this).prop('href');
        if (!href.search('laptops.com')) {
            $(this).prop('rel', 'nofollow');
        }
    });
});
</script>
delboy1978uk
  • 12,118
  • 2
  • 21
  • 39