0

There are many similar questions, but I still have not found a solution to what I try to achieve in php. I preg_match_all a string which can contain URLs written in various ways, but also contains text which should not match. What I need to match is:

www.something.com 
https://something.com
http://something.com
https://www.something.com
http://www.something.com

And any /..../.... after the URL, but not:

www.something.com</p> // this should match everything until the '</p>'
www.something.com. // this should match everything until the '.'

So far I got so far is

/((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:@\-_=#]+\.([a-zA-Z0-9\&\.\/\?\:@\-_=#])*/

and the function

if(preg_match_all("/((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:@\-_=#]+\.([a-zA-Z0-9\&\.\/\?\:@\-_=#])*/",$text,$urls)){
    foreach($urls[0]as $url ){
        $text = str_replace($url,'<a href="'.$url.'">'.$url.'</a>',$text);
    }
}

but this gives a problem with http://www.... (the http:// won't be inlcuded in the displayed text), and with a URL without http or https the created link is relative to the domain I show the page on. Suggestions?

Here's a live Demo

Edit: my best regex so for any URL with http or https is /(http|https)\:\/\/[a-zA-Z0-9\-\.]+(\.[a-zA-Z]{2,3})?(\/[A-Za-z0-9-._~!$&()*+,;=:]*)*/. Now I just need a way to regex the URLs with only www.something... and transform that into http://www.something... in the href.

Here's another live demo with different examples.

Edit 2: the answer from this question is quite good. The only problem with this that I still encounter is with </p> after the URL and if there are words before and after a dot (this.for example).

$url = '@(http)?(s)?(://)?(([a-zA-Z])([-\w]+\.)+([^\s\.]+[^\s]*)+[^,.\s])@';
$string = preg_replace($url, '<a href="http$2://$4" target="_blank" title="$0">$0</a>', $string);
echo $string;
Dirk J. Faber
  • 4,360
  • 5
  • 20
  • 58

3 Answers3

2

Maybe this one fits your needs:

$text = preg_replace_callback('~(https?://|www)[a-z\d.-]+[\w/.?=&%:#]*\w~i', function($m) {
    $prefix = stripos($m[0], 'www') === 0 ? 'http://' : '';
    return "<a href='{$prefix}{$m[0]}'>{$m[0]}</a>";
}, $text);
Victor
  • 5,493
  • 1
  • 27
  • 28
1
$text =  "<p>Some string www.test.com with urls http://test.com in it http://www.test.com. </p>";
$text = preg_replace_callback("@(http)?(s)?(://)?(([a-zA-Z])([-\w]+\.)+([^\s\.]+[^\s]*)+[^,.\s])@", 'replace_callback', $text);

function replace_callback($matches){
    return '<a href="' . $matches[0] . '" target="_blank">' . $matches[0] . '</a>';
}
Djanym
  • 332
  • 1
  • 4
  • 13
0

You regex was almost correct!

You we're matching a literal dot \. followed by 0 or more group of characters including the dot.

So i changed it to matching a literal dot followed by 1 or more characters excluding the dot which seems to be what you want, here is the final regex:

((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:@\-_=#]+\.([a-zA-Z0-9\&\/\?\:@\-_=#])+

See it in action: https://regex101.com/r/h5pUvC/3/

Dmitri Chebotarev
  • 2,429
  • 1
  • 11
  • 7