9

I have a function that will add the <a href> tag before a link and </a> after the link. However, it breaks for some webpages. How would you improve this function? Thanks!

function processString($s) 
{
    // check if there is a link

    if(preg_match("/http:\/\//",$s))
    {
        print preg_match("/http:\/\//",$s);


        $startUrl =  stripos($s,"http://");

        // if the link is in between text
        if(stripos($s," ",$startUrl)){
            $endUrl = stripos($s," ",$startUrl);
        }
        // if link is at the end of string
        else {$endUrl = strlen($s);}

        $beforeUrl = substr($s,0,$startUrl);
        $url = substr($s,$startUrl,$endUrl-$startUrl);
        $afterUrl = substr($s,$endUrl);

        $newString = $beforeUrl."<a href=\"$url\">".$url."</a>".$afterUrl;

        return $newString;
    }

    return $s;
}
AlexBrand
  • 11,971
  • 20
  • 87
  • 132
  • The regex is a little sloppy, but 99% of my input will have correct URLs if any – AlexBrand Nov 18 '10 at 16:53
  • 4
    What webpages does it break for? – Pekka Nov 18 '10 at 16:54
  • At the beginning you test agains https also, but later you omit the "s". Dont know, if this cause this error, because I also dont know, which pages are broken ;) – KingCrunch Nov 18 '10 at 16:58
  • Sorry, I removed the [s] from the regex. How could I include functionality for strings such as "www.google.com", or "https:www.example.com" ? – AlexBrand Nov 18 '10 at 17:00
  • "www.google.com" is going to be harder to parse. you need a long regex just to accommodate all TLDs. – bcosca Nov 18 '10 at 17:03

3 Answers3

20
function processString($s) {
    return preg_replace('/https?:\/\/[\w\-\.!~#?&=+\*\'"(),\/]+/','<a href="$0">$0</a>',$s);
}
Aron Rotteveel
  • 81,193
  • 17
  • 104
  • 128
bcosca
  • 17,371
  • 5
  • 40
  • 51
1

It breaks for all URLs that contain "special" HTML characters. To be safe, pass the three string components through htmlspecialchars() before concatenating them together (unless you want to allow HTML outside the URL).

tdammers
  • 20,353
  • 1
  • 39
  • 56
1
function processString($s){
  return preg_replace('@((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)@', '<a href="$1">$1</a>', $s);
}

Found it here

egze
  • 3,200
  • 23
  • 23