1

I've found many PHP script that convert urls in text to clickable links. But most of them don't work and some make big bugs. Some of them convert links that are already clickable. Others don't work and third makes parts from the text links. I need a script that will detect only links, not the text and will not convert the already clickable links because it's going on very ugly.

I found this code which seems the best from those I've tested. But it has some bugs. This code converts clickable links. Like this:

Original:

<a href="http://www.netload.in/dateiySgPP2b14W/1409423417ExpFut.pdf.htm" target="_blank">http://www.netload.in/dateiySgPP2b14W/1409...7ExpFut.pdf.htm</a>

Converted:

http://www.netload.in/dateiySgPP2b14W/1409423417ExpFut.pdf.htm" target="_blank">http://www.netload.in/dateiySgPP2b14W/1409...7ExpFut.pdf.htm 

Here is the code:

function parse_urls($text, $maxurl_len = 35, $target = '_self') // Make URLs Clickable
{
    if (preg_match_all('/((ht|f)tps?:\/\/([\w\.]+\.)?[\w-]+(\.[a-zA-Z]{2,4})?[^\s\r\n\(\)"\'<>\,\!]+)/si', $text, $urls))
    {
        $offset1 = ceil(0.65 * $maxurl_len) - 2;

        $offset2 = ceil(0.30 * $maxurl_len) - 1;

        foreach (array_unique($urls[1]) AS $url)
        {
            if ($maxurl_len AND strlen($url) > $maxurl_len)
            {
                $urltext = substr($url, 0, $offset1) . '...' . substr($url, -$offset2);
            }
            else
            {
                $urltext = $url;
            }

            $text = str_replace($url, '<a href="'. $url .'" target="'. $target .'" title="'. $url .'">'. $urltext .'</a>', $text);
        }
    }

    return $text;
}
dnagirl
  • 20,196
  • 13
  • 80
  • 123

3 Answers3

2

I just threw this together.

<?php
function replaceUrlsWithLinks($text){
    $dom = new DOMDocument;
    $dom->loadXML($text);
    $xpath = new DOMXpath($dom);
    $query = $xpath->query('//text()[not(ancestor-or-self::a)]');
    foreach($query as $item){
        $content = $item->textContent;
        if(preg_match_all('/((ht|f)tps?:\/\/([\w\.]+\.)?[\w-]+(\.[a-zA-Z]{2,4})?[^\s\r\n\(\)"\'<>\,\!]+)/si',$content,$matches,PREG_SET_ORDER | PREG_OFFSET_CAPTURE)){
            foreach($matches as $match){
                $newA = $dom->createElement('a',$match[0][0]);
                $newA->setAttribute('href',$match[0][0]);
                $newA->setAttribute('target','_blank');
                $a = $item->splitText($match[0][1]);
                $b = $a->splitText(strlen($match[0][0]));
                $a->parentNode->replaceChild($newA,$a);
            }
        }
    }
    return $dom->saveHtml();
}
// The HTML to process ...
$html = <<<HTML
<block>
<a href="http://google.com">http://google.com</a>
<b>Stuff http://google.com</b>
asdf http://google.com ffaa 
</block>
HTML;
// Process the HTML and echo it out.
echo replaceUrlsWithLinks($html);
?>

The output would be:

<block>
<a href="http://google.com">http://google.com</a>
<b>Stuff <a href="http://google.com" target="_blank">http://google.com</a></b>
asdf <a href="http://google.com" target="_blank">http://google.com</a> ffaa 
</block>

You shouldn't use regular expressions to manipulate HTML.

Hope this helps.

Kyle

-- Edit --

The previous code is more efficient, but if you plan to have two URLs in the same parent node, the code will break because the DOM tree is changed. To fix this, you can use this more intensive code:

<?php
function replaceUrlsWithLinks($text){
    $dom = new DOMDocument;
    $dom->loadXML($text);
    $xpath = new DOMXpath($dom);
    while(true){
        $shouldBreak = false;
        $query = $xpath->query('//text()[not(ancestor-or-self::a)]');
        foreach($query as $item){
            $shouldBreak = false;
            $content = $item->textContent;
            if(preg_match_all('/((ht|f)tps?:\/\/([\w\.]+\.)?[\w-]+(\.[a-zA-Z]{2,4})?[^\s\r\n\(\)"\'<>\,\!]+)/si',$content,$matches,PREG_SET_ORDER | PREG_OFFSET_CAPTURE)){
                foreach($matches as $match){
                    $newA = $dom->createElement('a',$match[0][0]);
                    $newA->setAttribute('href',$match[0][0]);
                    $newA->setAttribute('target','_blank');
                    $a = $item->splitText($match[0][1]);
                    $b = $a->splitText(strlen($match[0][0]));
                    $a->parentNode->replaceChild($newA,$a);
                    $shouldBreak = true;
                    break;
                }
            }
            if($shouldBreak == true)break;
        }
        if($shouldBreak == true){
            continue;
        }
        else {
            break;
        }
    }
    return $dom->saveHtml();
}

$html = <<<HTML
<block>
<a href="http://google.com">http://google.com</a>
<b>Stuff http://google.com</b>
asdf http://google.com ffaa  http://google.com
</block>
HTML;

echo replaceUrlsWithLinks($html);
?>
Kyle
  • 3,935
  • 2
  • 30
  • 44
0

this function wraps text like http://www.domain.com in an anchor tag. What I see here is that you are trying to convert an anchor tag to an anchor tag, which of course won't work. So: don't write the anchors in your text, and let the script create them for you.

Dirk McQuickly
  • 2,099
  • 1
  • 17
  • 19
  • The script is automated and some teams have tags will be great to don't have but for unfortunately. Other way is to remove anchor tag. – Tencho Tenchev Jul 26 '12 at 16:39
  • Ok. I did not know that. When you want to go the way of removing the tags, beware of constructions like `click here`. You will have to fetch both the url and the link- text. – Dirk McQuickly Jul 26 '12 at 16:51
0

You're running into the usual problems that happen when you try to parse HTML with regexes. You need a proper HTML parser. Have a look at this thread.

Community
  • 1
  • 1
dnagirl
  • 20,196
  • 13
  • 80
  • 123