0

I have the following code which parses a tweet to make links, mentions and hashes into links:

function parseTwitterText($text) {

    $returnText = $text;
    $hashPattern = '/\#([A-Za-z0-9\_]+)/i';
    $mentionPattern = '/\@([A-Za-z0-9\_]+)/i';
    $urlPattern = '/(http[s]?\:\/\/[^\s]+)/i';
    $robotsFollow = false;

    // SCAN FOR LINKS FIRST!!! Otherwise it will replace the hashes and mentions
    $returnText = preg_replace($urlPattern, '<a href="$1" ' . (($robotsFollow)? '':'rel="nofollow"') . '>$1</a>', $returnText);
    $returnText = preg_replace($hashPattern, '<a href="http://twitter.com/#!/search?q=%23$1" ' . (($robotsFollow)? '':'rel="nofollow"') . '>#$1</a>', $returnText);
    $returnText = preg_replace($mentionPattern, '<a href="http://twitter.com/$1" ' . (($robotsFollow)? '':'rel="nofollow"') . '>@$1</a>', $returnText);
    return $returnText;
}

However if I have a tweet like:

“@WOWPicsOfLife: Tickling a turtle. http://t.co/rqHVQvhqdO”

The result will be:

“<a href="http://twitter.com/WOWPicsOfLife" rel="nofollow">@WOWPicsOfLife</a>: Tickling a turtle. <a href="http://t.co/rqHVQvhqdO”" rel="nofollow">http://t.co/rqHVQvhqdO”</a>

So as you can see it added the last quote into the last link (which obviously breaks it).

I'm presuming this because the quote is next to the link and because it's at the end with no space it will tie it in with the link... Question is how do I fix something like this? Perhaps an amendment to the regex to ignore quote marks?

Cameron
  • 27,963
  • 100
  • 281
  • 483

1 Answers1

0

The key of course is into your

  $urlPattern = '/(http[s]?\:\/\/[^\s]+)/i';

and specifically in [^\s]+ which says that every char which is not a "space" is part of the URL. You need to restrict it to a list of "safe" characters which can be surely part of a URL. I think this can't be done totally all-URL-safe with a regex, but you can mitigate strongly the problem this way.

See also this question: Characters allowed in a URL.

Community
  • 1
  • 1
ShinTakezou
  • 9,432
  • 1
  • 29
  • 39