3

I am using the below function to search for text links and convert them to a hyperlink. First of all is it correct? It appears to work but do you know of a (perhaps malformed) url that would break this function?

My question is whether it is possible to get this to support port numbers as well, for example stackoverflow.com:80/index will not be converted as the port is not seen as a valid part of the url.

So in summary I am looking for Stackoverflow style url recognition, which I believe is a custom addition to Markdown.

  /**
   * Search for and create links from urls
   */
  static public function autoLink($text) {
    $pattern = "/(((http[s]?:\/\/)|(www\.))(([a-z][-a-z0-9]+\.)?[a-z][-a-z0-9]+\.[a-z]+(\.[a-z]{2,2})?)\/?[a-z0-9._\/~#&=;%+?-]+[a-z0-9\/#=?]{1,1})/is";
    $text = preg_replace($pattern, " <a href='$1'>$1</a>", $text);
    // fix URLs without protocols
    $text = preg_replace("/href='www/", "href='http://www", $text);

    return $text;
  } 

Thanks for your time,

Pez Cuckow
  • 14,048
  • 16
  • 80
  • 130
  • 1
    Your function won't work for URLs to subdomains (ex `my.domain.com/mypage`) – user229044 Aug 23 '11 at 17:14
  • 3
    How accurate do you want things to be? [www.ca](http://www.ca) completely valid url, but not one you expect to see regularly. There's plenty of things that ARE hostnames but definitely do not look like one. – Marc B Aug 23 '11 at 17:18
  • Ideally covering all possibilities but I doubt anyone will point to a url like www.ca, would be interesting to see how stack overflow's one works it seems very good! – Pez Cuckow Aug 23 '11 at 17:23
  • @Pez: Stack Overflow uses [MarkdownSharp](http://blog.stackoverflow.com/2009/12/introducing-markdownsharp/) with "Stack Exchange additions": http://stackoverflow.com/editing-help. For PHP, the original Markdown project recommends the [PHP Markdown](http://michelf.com/projects/php-markdown/) port by Michel Fortin. – Daniel Trebbien Aug 23 '11 at 17:58

3 Answers3

1

You should also look at the answers to this question: How to mimic StackOverflow Auto-Link Behavior


I have ended up combining the answers I have got both at stack overflow and talking to colleagues. The below code is the best we could come up with.

/**
   * Search for and create links from urls
   */
  static public function autoLink($text) {
    $pattern = "/\b((?P<protocol>(https?)|(ftp)):\/\/)?(?P<domain>[-A-Z0-9\\.]+)[.][A-Z]{2,7}(([:])?([0-9]+)?)(?P<file>\/[-A-Z0-9+&@#\/%=~_|!:,\\.;]*)?(?P<parameters>\?[A-Z0-9+&@#\/%=~_|!:,\\.;]*)?/ise";
$text = preg_replace($pattern, "' <a href=\"'.htmlspecialchars('$0').'\">$0</a>'", $text);

    // fix URLs without protocols
    $text = preg_replace("#href='www#i", "href='http://www", $text);
    $text = preg_replace("#href=['\"](?!(https?|ftp)://)#i", "href='http://", $text);

    return $text;
  } 
Community
  • 1
  • 1
Pez Cuckow
  • 14,048
  • 16
  • 80
  • 130
  • This function screws up your html when there is no protocol: A simple link like www.google.com and info@google.com is converted into erroneous html code. – bart Oct 09 '11 at 18:06
  • In the final version I put in some checks to prevent this. Unfortunately I no longer have access. – Pez Cuckow Oct 10 '11 at 21:43
0

Rather than writing your own autolinking routine, which is essentially the beginning of a custom markup engine, you might want to use an open source markup engine, as it is less likely to be vulnerable to cross-site scripting attacks. One example of an open source markup engine for PHP is PHP Markdown, which has the ability to autolink URLs and essentially uses the same Markdown syntax that is in use at Stack Overflow.

One note: you should always escape HTML special characters using htmlspecialchars() before sticking the text into attributes or in the inner text of elements.

Daniel Trebbien
  • 38,421
  • 18
  • 121
  • 193
0
$pattern = "/\b(?P<protocol>https?|ftp):\/\/(?P<domain>[-A-Z0-9.]+)(([:])?([0-9]+)?)(?P<file>\/[-A-Z0-9+&@#\/%=~_|!:,.;]*)?(?P<parameters>\?[A-Z0-9+&@#\/%=~_|!:,.;]*)?/i";

will match:

http://www.scroogle.org/index.html

http://www.scroogle.org:80/index.html?source=library

Pez Cuckow
  • 14,048
  • 16
  • 80
  • 130
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268