5

I found the following online but I'm having trouble implementing it

(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?

This is what I want the php to do:

Take the following : Look here: http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php

And turn it into: Look here: <a href="http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php">http://www.rocketlanguages.com/span...anish_accents.php</a>

If the URL is long then the a text gets broken down with a ... in the middle

Fluffeh
  • 33,228
  • 16
  • 67
  • 80
Jake
  • 3,326
  • 7
  • 39
  • 59
  • +1 for properly articulating the problem – N.B. Sep 12 '12 at 10:14
  • I don't know why the answer redirecting to filter-var() http://php.net/manual/en/function.filter-var.php has been removed somehow... but seemed okay to resolve the first part of the question – Del Pedro Sep 12 '12 at 10:15
  • 1
    Basically duplicate but not exactly because of the ellipsis requirement: http://stackoverflow.com/questions/5080826/php-linkify-links-in-content - you'll note that the regex in the accepted solution is considerably more complicated than the one you propose! – DaveRandom Sep 12 '12 at 10:16
  • @DelPedro I removed it because the issue is neither to validate the URL nor to extract some part of it, but to extract URLs from a block of text and linkify them, for which regex is the really the only tool in the PHP toolbox. "I gone didn't read teh questionz proper" – DaveRandom Sep 12 '12 at 10:19
  • Also how would it work with multiple urls in a text? – Jake Sep 12 '12 at 10:52
  • 2
    the ellipsis part could be dealt with using CSS `text-overflow:ellipsis`, rather than trying to truncate it in PHP. This will simplify the code quite significantly. – SDC Sep 12 '12 at 12:53
  • @DaveRandom: The URL on the stackoverflow you point to is a doozy! Very useful. – David Sep 17 '12 at 06:13

2 Answers2

1

Try this:

// URL regex from here:
// http://daringfireball.net/2010/07/improved_regex_for_matching_urls
define( 'URL_REGEX', <<<'_END'
~(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))~
_END
);

// PHP 5.3 or higher, can use closures (anonymous functions)
function replace_urls_with_anchor_tags( $string,
                                        $length = 50,
                                        $elision_string = '...' ) {
    $replace_function = function( $matches ) use ( $length, $elision_string) {
        $matched_url = $matches[ 0 ];
        return '<a href="' . $matched_url . '">' .
                abbreviated_url( $matched_url, $length, $elision_string )   .
                '</a>';
    };
    return preg_replace_callback(
        URL_REGEX,
        $replace_function,
        $string
    );
}

function abbreviated_url( $url, $length = 50, $elision_string = '...' ) {
    if ( strlen( $url ) <= $length ) {
        return $url;
    }
    $width_either_side = (int) ( ( $length - strlen( $elision_string ) ) / 2 );
    $left  = substr( $url, 0, $width_either_side );
    $right = substr( $url, strlen( $url ) - $width_either_side );

    return $left . $elision_string . $right;
}

(The backtick in the URL_REGEX definition confuses stackoverflow.com's syntax highlighting, but it's nothing to be concerned about)

The function replace_urls_with_anchor_tags takes a string and changes all the URLs matched within to anchor tags, shortening long URLs by eliding with ellipses. The function takes optional length and elision_string arguments in case you wish to play around with the length and change the ellipses to something else.

Here's a usage example:

// Test it out
$test = <<<_END
Look here:
http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php

And here:
http://stackoverflow.com/questions/12385770/implementing-web-address-regular-expression
_END;

echo replace_urls_with_anchor_tags( $test, 50, '...' );
// OUTPUT:
// Look here:
// <a href="http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php">http://www.rocketlangua...ion_spanish_accents.php</a>
//
// And here:
// <a href="http://stackoverflow.com/questions/12385770/implementing-web-address-regular-expression">http://stackoverflow.co...ress-regular-expression</a>

Note that if you are using PHP 5.2 or lower you must rewrite replace_urls_with_anchor_tags to use create_function instead of closures. Closures were not introduced until PHP 5.3:

// No closures in PHP 5.2, must use create_function()
function replace_urls_with_anchor_tags( $string,
                                        $length = 50,
                                        $elision_string = '...' ) {
    $replace_function = create_function(
        '$matches',
        'return "<a href=\"$matches[0]\">" .
                abbreviated_url( $matches[ 0 ], '            .
                                 $length  . ', '             .
                                 '"' . $elision_string . '"' .
                               ') . "</a>";'
    );
    return preg_replace_callback(
        URL_REGEX,
        $replace_function,
        $string
    );
}

Note that I replaced the URL regex you had found with one linked to on the page DaveRandom referred to in his comment. It's more complete, and in fact there is actually a mistake in the regex you were using -- a couple of '/' characters are not escaped (in here: [\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#]). Also, it doesn't detect port numbers like 80 or 8080.

Hope this helps.

David
  • 751
  • 3
  • 13
0

I am using this Regular expression and it is working fine for me, try this if you want

(http|https|ftp):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?
iLaYa ツ
  • 3,941
  • 3
  • 32
  • 48