21

With PHP how can I mimic the auto-link behavior of Stack Overflow (which BTW is awesomely cool)?

For instance, the following URL:

http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior

Is converted into this:

<a title="how to mimic stackoverflow auto link behavior" rel="nofollow" href="http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior">stackoverflow.com/questions/1925455/…</a>

I don't really care for the title attribute in this case.


And this:

http://pt.php.net/manual/en/function.base-convert.php#52450

Is converted into this:

<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php#52450">pt.php.net/manual/en/…</a>

How can I make a similar function in PHP?

PS: Check my comments on this question for some more examples and behaviors.

STLDev
  • 5,950
  • 25
  • 36
Alix Axel
  • 151,645
  • 95
  • 393
  • 500
  • So what you are really asking is how to trim the visible text on long links? – Greg Dec 22 '09 at 02:46
  • Yes, matching the actual URL is a piece of cake. The text that is displayed is another story, I can't figure out the logic behind it. – Alix Axel Dec 22 '09 at 03:04
  • It seems to be basically keeping only the first 2 levels, and truncating the rest – K Prime Dec 22 '09 at 03:55
  • @K Prime: Yes but what about http://a.b/c/d/e/f/test? It shows the entire string (5 levels). – Alix Axel Dec 22 '09 at 04:05
  • Just testing! http://www.php.net/manual-manual-manual-manual-manual-manual/en-en-en-en-en-en-en-en-en-en-en-/something/test – Alix Axel Dec 22 '09 at 04:06
  • Looks like below a URI string length threshold, 2 path segments are shown. Beyond the threshold only the first segment is shown. – micahwittman Dec 22 '09 at 04:38
  • http://www.php.net/manual-manual-manual-manual-manual-manual-manual-manual-manual-manual-manual-manual-manual-manual-manual-manual-manual-manual-manual/en-en-en-en-en-en-en-en-en-en-en-/something/test – Alix Axel Dec 28 '09 at 14:53
  • http://www.php.net/manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals/en-en-en-en-en-en-en-en-en-en-en-/something/test – Alix Axel Dec 28 '09 at 14:54
  • http://www.phps.net/manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals-manuals/en-en-en-en-en-en-en-en-en-en-en-/something/test – Alix Axel Dec 28 '09 at 14:54
  • http://www.php.net/manuals/en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-en-/something/test – Alix Axel Dec 28 '09 at 14:56
  • http://php.net/manuals/en-en-en-en-en-en-en-en-en-en-en/something/test – Alix Axel Dec 28 '09 at 14:56
  • http://php.net/manuals/en-en-en-en-en-en-en-en-en-en/something/test/ – Alix Axel Dec 28 '09 at 14:58
  • http://php.net/manuals/en/en/en/en/test – pix0r Dec 28 '09 at 20:26
  • http://pt.php.net/manual/en/function.base-convert.php#52450 (testing again) – pix0r Dec 28 '09 at 20:27
  • http://a.b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/z/y/w/z/test – pix0r Dec 28 '09 at 20:37
  • Think I got a solution that works pretty much the same as Stack Overflow: http://stackoverflow.com/questions/1925455#1971451 – pix0r Dec 28 '09 at 20:53
  • Testing FTP: ftp://user:pass@host.com:21/fhjfdfdhj/jhgfdhjdfhd/jfdfdhdfkjfdhfdkhdf/ – Alix Axel Dec 29 '09 at 06:48
  • http://php-php-php-php-php-php-php-php-php-php-php-php-php.net/manuals-manuals-manuals-manuals-manuals/test/ – Alix Axel Dec 29 '09 at 13:22
  • http://pt.wikipedia.org/wiki/Guimarães – Alix Axel Dec 29 '09 at 13:35
  • http://www.morangoscomaçúcar.com/ – Alix Axel Dec 29 '09 at 14:16
  • testing http://www.google.com – Alix Axel Jan 14 '10 at 09:17
  • http://google.com/fdfdfdfd/dffdfdfd/dffddffd – Alix Axel Jan 14 '10 at 09:47
  • http://www.google.com/1/2/3/4/5/index.php?q=lol#01234567890123456789012345678901234567890123456789012345678901234567890123456789 – Alix Axel Jan 14 '10 at 09:55
  • http://www.google.google.google.google.google.google.google.google.google.google.google.google.google.google.google.google.google.google.google.google.google.google.google.google.google.google.google.google.com/1/2/3/4/5/index.php?q=lol#01234567890123456789012345678901234567890123456789012345678901234567890123456789 – Alix Axel Jan 14 '10 at 10:00
  • http://www.example.com/blah_blah_(wikipedia)_and_more_(parens)_eh – Alix Axel Jan 14 '10 at 10:38
  • `http://aççççççççççççççççççççççççççççççççççççççç.com/` -> http://aççççççççççççççççççççççççççççççççççççççç.com/ – Alix Axel Jan 14 '10 at 11:02
  • http://a.b/c/d/e/f/test/ http://a.b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/z/y/w/z/test – Alix Axel Jan 14 '10 at 11:20
  • http://a.b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/z/y/w/z/test/ – Alix Axel Jan 14 '10 at 11:58
  • http://www.google.com/index.php?q=lol#01234567890123456789012345678901234567890123456789012345678901234567890123456789 – Alix Axel Jan 14 '10 at 12:27
  • 3
    http://en.wikipedia.org/wiki/The_Game_(mind_game) – Alix Axel Jan 14 '10 at 12:34
  • What about that: (http://en.wikipedia.org/wiki/The_Game_(mind_game)) – Ivan Mar 11 '16 at 12:09

6 Answers6

50

Try this out. The URL-matching regex pattern is from Daring Fireball.

/**
 * Replace links in text with html links
 *
 * @param  string $text
 * @return string
 */
function auto_link_text($text)
{
   // a more readably-formatted version of the pattern is on http://daringfireball.net/2010/07/improved_regex_for_matching_urls
   $pattern  = '(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';

   $callback = create_function('$matches', '
       $url       = array_shift($matches);
       $url_parts = parse_url($url);

       $text = parse_url($url, PHP_URL_HOST) . parse_url($url, PHP_URL_PATH);
       $text = preg_replace("/^www./", "", $text);

       $last = -(strlen(strrchr($text, "/"))) + 1;
       if ($last < 0) {
           $text = substr($text, 0, $last) . "&hellip;";
       }

       return sprintf(\'<a rel="nofollow" href="%s">%s</a>\', $url, $text);
   ');

   return preg_replace_callback($pattern, $callback, $text);
}

Input Text:

This is my text.  I wonder if you know about asking questions on StackOverflow:
 Check This out http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior

 Also, base_convert php function?
http://pt.php.net/manual/en/function.base-convert.php#52450

http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450

Output Text:

This is my text.  I wonder if you know about asking questions on StackOverflow:
 Check This out <a rel="nofollow" href="http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior">stackoverflow.com/questions/1925455/&hellip;</a>

 Also, base_convert php function?
<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php#52450">pt.php.net/manual/en/&hellip;</a>

<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450">pt.php.net/manual/en/&hellip;</a>
Rory O'Kane
  • 29,210
  • 11
  • 96
  • 131
Eric Coleman
  • 818
  • 6
  • 7
24

This is based on the same daringfireball.net regular expression, but adds a bit more logic than Eric Coleman's example, as well as configuration for maximum URL depth (SO seems to be 50), maximum path depth when URL is truncated (SO seems to be 2), and ellipsis character (&hellip;).

As far as I know this replicates all of the SO URL rewriting functionality, at least as far as what was discussed so far in the comments and responses here.

function auto_link_text($text) {
    $pattern  = '#\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))#';
    return preg_replace_callback($pattern, 'auto_link_text_callback', $text);
}

function auto_link_text_callback($matches) {
    $max_url_length = 50;
    $max_depth_if_over_length = 2;
    $ellipsis = '&hellip;';

    $url_full = $matches[0];
    $url_short = '';

    if (strlen($url_full) > $max_url_length) {
        $parts = parse_url($url_full);
        $url_short = $parts['scheme'] . '://' . preg_replace('/^www\./', '', $parts['host']) . '/';

        $path_components = explode('/', trim($parts['path'], '/'));
        foreach ($path_components as $dir) {
            $url_string_components[] = $dir . '/';
        }

        if (!empty($parts['query'])) {
            $url_string_components[] = '?' . $parts['query'];
        }

        if (!empty($parts['fragment'])) {
            $url_string_components[] = '#' . $parts['fragment'];
        }

        for ($k = 0; $k < count($url_string_components); $k++) {
            $curr_component = $url_string_components[$k];
            if ($k >= $max_depth_if_over_length || strlen($url_short) + strlen($curr_component) > $max_url_length) {
                if ($k == 0 && strlen($url_short) < $max_url_length) {
                    // Always show a portion of first directory
                    $url_short .= substr($curr_component, 0, $max_url_length - strlen($url_short));
                }
                $url_short .= $ellipsis;
                break;
            }
            $url_short .= $curr_component;
        }

    } else {
        $url_short = $url_full;
    }

    return "<a rel=\"nofollow\" href=\"$url_full\">$url_short</a>";
}

Sample Input:

This is my text.  I wonder if you know about asking questions on StackOverflow:
Check This out http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior

Also, base_convert php function?
http://pt.php.net/manual/en/function.base-convert.php#52450

http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450

http://a.b/c/d/e/f/test

and http://a.b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/z/y/w/z/test

Sample Output:

This is my text.  I wonder if you know about asking questions on StackOverflow:
Check This out <a rel="nofollow" href="http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior">http://stackoverflow.com/questions/1925455/&hellip;</a> 

Also, base_convert php function?
<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php#52450">http://pt.php.net/manual/en/&hellip;</a> 

<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450">http://pt.php.net/manual/en/&hellip;</a> 

<a rel="nofollow" href="http://a.b/c/d/e/f/test">http://a.b/c/d/e/f/test</a> 

and <a rel="nofollow" href="http://a.b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/z/y/w/z/test">http://a.b/c/d/&hellip;</a>
pix0r
  • 31,139
  • 18
  • 86
  • 102
  • +1, Indeed I was also testing the 50 length thingy and your answer is the most complete one to this question, I wish I had seen it before the bounty expired. – Alix Axel Dec 29 '09 at 06:47
  • Seems to break on links like this: www.google.com - misses the schema part. I worked around that with ` if ($matches[2] == "www.") { $url_full = "http://" . $url_full; } ` – atamur Mar 30 '13 at 20:51
  • thank you. this is the only one that works with this string "google(http://google.com)google" (edit: even stackoverflow's autolink cannot detect this string) – user77177928 Aug 11 '16 at 07:11
5

This will convert the sample string to what you are after. I left out title as that comes from a different source than just a standalone URL and you said that was not important.

<?php
$urlInput="http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior";
preg_match('@http://(?:www\.)?(\S+/)\S*(?:\s|$)@i', $urlInput, $matches);
print('<a rel="nofollow" href="' . trim($matches[0]) . '">' . $matches[1] . '...</a>');
?>

Extend as needed to scan through your text.

If you want to match just a certain number of URL path elements, use this RE:

'@http://(?:www\.)?((?:\S+?/){1,3})\S*(?:\s|$)@i'

This will extract out up to 3 path elements (the host and up to two directories). You can vary the upper bound in {1,3} to define the maximum number of path elements you want.

Changed the ending \S to allow for zero matches.

Kevin Brock
  • 8,874
  • 1
  • 33
  • 37
  • +1, WOW, where did the magic come from? I was like "this isn't going to work" but surprisingly it almost did! I can't process Regex at this time but I'll try to understand it tomorrow. – Alix Axel Dec 22 '09 at 03:57
  • Also, I said almost because it fails for the following URLs: `http://a.b/c/d/e/f/test` and `http://a.b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/z/y/w/z/test` – Alix Axel Dec 22 '09 at 03:59
  • http://a.b/c/d/e/f/test and http://a.b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/z/y/w/z/test – Alix Axel Dec 22 '09 at 04:00
  • Also for URLs like `http://www.stackoverflow.com/` it fails: "Notice: Undefined offset: 0 in I:\WWW\index.php on line 35 Notice: Undefined offset: 1 in I:\WWW\index.php on line 35 ..." – Alix Axel Dec 22 '09 at 04:02
  • Was adding the bounded check while the comments were being typed. This should work now for the longer URls. – Kevin Brock Dec 22 '09 at 04:05
  • Ok, this will now work if the URL just has the host name and trailing slash. However, it is much harder to make this work if there is no trailing slash. – Kevin Brock Dec 22 '09 at 04:23
  • 1
    Looks like you're showing `...` even when the URL is very short. – philfreo Dec 26 '09 at 03:06
4

If you have a predictable URL like SO then it should be easy to grab links with a regex and filter out the ones that match the pattern. So if your URL is http://example.com/stuff/1234 then finding http://example.com/stuff/1234/how-to-mimic would be pretty trivial with a regex.

<?php
preg_match('/http:\/\/example.com\/(\w*)\/(\d)[\/*]/', $text, $matches);

if (is_array($matches))
{
  foreach ($matches as $match)
  {
    // do something...
  }
}
?>
Darrell Brogdon
  • 6,843
  • 9
  • 47
  • 62
  • Take `http://pt.php.net/manual/en/function.base-convert.php#52450` for instance. Check the comment on my question for the output. – Alix Axel Dec 18 '09 at 00:20
3

Based somewhat on Kevin Brock's answer, but allows configurable params (folder depth & URL length), and accepts URLs without trailing slashes:

$url = 'http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior';
$output = '';
$params = array (
    'length' => 10,
    'depth' => 2,
);
preg_match ('@http://(?:www\.)?([^/?# ]+)(/\S+)?(?=\s|$)@i', $url, $matches);
if (isset ($matches[2]))
{
    $parts = explode('/', substr($matches[2], 1));
    if (count($parts) > $params['depth'] && strlen($matches[1].$matches[2]) > $params['length'])
        $output = $matches[1].'/'.implode('/', array_slice($parts, 0, 2)).'/...';
    else
        $output = $matches[1].$matches[2];
}
else
    $output = $matches[1];

echo '<a href="'.$matches[0].'">'.$output.'</a>';

Hope this helps

K Prime
  • 5,809
  • 1
  • 25
  • 19
  • This answer seems to be the most flexible so far however it's pretty difficult to use it to replace URLs in free text since it does not use `preg_replace()`. – Alix Axel Dec 28 '09 at 15:01
  • 1
    You could convert it to a function, and use that as callback to `preg_replace` – K Prime Dec 29 '09 at 01:58
1

See Regex (regular expression) to match a URL:

https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?

PHP Example: Automatically link URL's inside text.

$text = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $text);
Community
  • 1
  • 1
CodeJoust
  • 3,760
  • 21
  • 23