1

I want to filter the input text if it's got A URL inside it. By URL I mean that every thing that corresponds to a valid internet address like www.example.com, example.com, http://www.example.com, http://example.com/foo/bar.

I think I've gotta use regular expressions and the preg_match function so I need the correct regexp pattern for this purpose.
I'd be very grateful if anybody could give me that.

hakre
  • 193,403
  • 52
  • 435
  • 836
2hamed
  • 8,719
  • 13
  • 69
  • 112
  • possible duplicate of: http://stackoverflow.com/questions/1449618/how-to-find-a-url-from-a-content-by-php http://stackoverflow.com/questions/6948901/php-preg-match-to-find-and-locate-a-dynamic-url-from-html-pages – Lawrence Cherone Aug 07 '11 at 15:18
  • Possible duplicate: [How to extract http links from a paragraph and store them in a array on php](http://stackoverflow.com/questions/6861324/how-to-extract-http-links-from-a-paragraph-and-store-them-in-a-array-on-php) – hakre Aug 07 '11 at 16:23
  • By *filter* what do you mean? Remove everything else or remove all if it does not match? How many URLs can be part of the input, just one? – hakre Aug 07 '11 at 16:24
  • This link may help to write regular expression http://www.addedbytes.com/download/regular-expressions-cheat-sheet-v2/pdf/ – Rajasekar Gunasekaran Aug 07 '11 at 15:18
  • by filtering I mean, just finding the text that has a url in it and preventing it from being stored in DB. – 2hamed Aug 09 '11 at 07:37

3 Answers3

2

This article has a nice regex for matching urls: http://daringfireball.net/2010/07/improved_regex_for_matching_urls

For PHP you would need to escape the regex properly, for example like this:

$text = "here is some text that contains a link to www.example.com, and it will be matched.";
preg_match("/(?i)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/", $text, $matches);
var_dump($matches);
AHM
  • 5,145
  • 34
  • 37
1
$html = "http://www.scroogle.org
http://www.scroogle.org/
http://www.scroogle.org/index.html
http://www.scroogle.org/index.html?source=library
You can surf the internet anonymously at https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi.";


preg_match_all('/\b((?P<protocol>https?|ftp):\/\/(?P<domain>[-A-Z0-9.]+)(?P<file>\/[-A-Z0-9+&@#\/%=~_|!:,.;]*)?(?P<parameters>\?[A-Z0-9+&@#\/%=~_|!:,.;]*)?)/i', $html, $urls, PREG_PATTERN_ORDER);
$urls = $urls[1][0];

Will match:

http://www.scroogle.org

http://www.scroogle.org/

http://www.scroogle.org/index.html

http://www.scroogle.org/index.html?source=library

You can surf the internet anonymously at https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi.

To loop results you can use:

for ($i = 0; $i < count($urls[0]); $i++) {
    echo $urls[1][$i]."\n";
}

will output:

http://www.scroogle.org
http://www.scroogle.org/
http://www.scroogle.org/index.html
http://www.scroogle.org/index.html?source=library
https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi

cheers, Lob

Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
1

Found here: http://zenverse.net/php-function-to-auto-convert-url-into-hyperlink/

Functions from WordPress.

function _make_url_clickable_cb($matches) {
    $ret = '';
    $url = $matches[2];

    if ( empty($url) )
        return $matches[0];
    // removed trailing [.,;:] from URL
    if ( in_array(substr($url, -1), array('.', ',', ';', ':')) === true ) {
        $ret = substr($url, -1);
        $url = substr($url, 0, strlen($url)-1);
    }
    return $matches[1] . "<a href=\"$url\" rel=\"nofollow\">$url</a>" . $ret;
}

function _make_web_ftp_clickable_cb($matches) {
    $ret = '';
    $dest = $matches[2];
    $dest = 'http://' . $dest;

    if ( empty($dest) )
        return $matches[0];
    // removed trailing [,;:] from URL
    if ( in_array(substr($dest, -1), array('.', ',', ';', ':')) === true ) {
        $ret = substr($dest, -1);
        $dest = substr($dest, 0, strlen($dest)-1);
    }
    return $matches[1] . "<a href=\"$dest\" rel=\"nofollow\">$dest</a>" . $ret;
}

function _make_email_clickable_cb($matches) {
    $email = $matches[2] . '@' . $matches[3];
    return $matches[1] . "<a href=\"mailto:$email\">$email</a>";
}

function make_clickable($ret) {
    $ret = ' ' . $ret;
    // in testing, using arrays here was found to be faster
    $ret = preg_replace_callback('#([\s>])([\w]+?://[\w\\x80-\\xff\#$%&~/.\-;:=,?@\[\]+]*)#is', '_make_url_clickable_cb', $ret);
    $ret = preg_replace_callback('#([\s>])((www|ftp)\.[\w\\x80-\\xff\#$%&~/.\-;:=,?@\[\]+]*)#is', '_make_web_ftp_clickable_cb', $ret);
    $ret = preg_replace_callback('#([\s>])([.0-9a-z_+-]+)@(([0-9a-z-]+\.)+[0-9a-z]{2,})#i', '_make_email_clickable_cb', $ret);

    // this one is not in an array because we need it to run last, for cleanup of accidental links within links
    $ret = preg_replace("#(<a( [^>]+?>|>))<a [^>]+?>([^>]+?)</a></a>#i", "$1$3</a>", $ret);
    $ret = trim($ret);
    return $ret;
}

Usage:

$string = 'I have some texts here and also links such as http://www.youtube.com , www.haha.com and lol@example.com. They are ready to be replaced.';

echo make_clickable($string);
Buddy
  • 1,808
  • 3
  • 19
  • 28