0

I'm currently developing a little browser-based Twitter widget.

Currently, I'm stuck with getting the URLs to work. I'm kinda newbie, when it comes to regex (I know, how to get parts of a string, but this one – tough one).

So, I need a regex that would search/replace

www.domain.tld -> <a href="http://www.domain.tld">http://www.domain.tld</a>

With/without http://, preferably.

Any advice is welcome. Thanks.

4 Answers4

0

This is how far I've got:

www\.(?:\S*)\.(?:\S{2,3})

It checks for www. at beginning, any non-witespace chars and top level domain (2 or three chars).

  • 1
    .info? .mobi? .museum? You should probably check for that. –  May 19 '10 at 22:22
  • Actually, I could even just check for non-whitespace chars, as mostly URLs have parameters too (?param=value&etc=1). Of course, at start I'll need to sanitize the input, for anti-XSS measures. – Kristaps May 19 '10 at 22:32
0

I'm in an ever going war against RegExes, I don't like them. So, do I'd do it like this instead:

function get_domain_from_anchor($anchor, $delimiter = '"') {
    return substr(strstr(strstr($anchor, $delimiter), $delimiter.'>', true), 8);
}

echo get_domain_from_anchor('<a href="http://www.domain.net">http://www.domain.net</a>');

// OUTPUTS: www.domain.net

Much better :D

Sune Rasmussen
  • 956
  • 4
  • 14
0

I believe this is exactly what you're looking for: PHP validation/regex for URL

Some more information regarding extraction of URLs: Extract URLs from text in PHP

Community
  • 1
  • 1
Coding District
  • 11,901
  • 4
  • 26
  • 30
  • Thank you, I came up with ((?:http:\/\/|https:\/\/)(?:(?:[a-z0-9\&\.?=\-_\[\]\/])*)) Seems that'll work. Thank you! – Kristaps May 19 '10 at 22:54
0

Try twitter-text-php. It is ported to PHP from the official Twitter code.

From the README file:

$autolinker = new Twitter_Autolink();
$html = $autolinker->autolink("Tweet mentioning @mikenz and refuring to his list @mikeNZ/sports and website http://mikenz.geek.nz");
echo $html;
mcrumley
  • 5,682
  • 3
  • 25
  • 33