0

Possible Duplicate:
PHP Linkify Links In Content

I've got a little stuck with finding text links and wrapping them in A tags.

I'm using this so far / [\w]*\.[a-z]{2,}/i to find the link which works fine for links like this, stackoverflow.com but it misses www. or anything before hand.

To recap, I'm trying to find all links and wrap in A tags. Non of the text contains the protocol part (http(s)://) or port part which makes it a tad harder.

Community
  • 1
  • 1
TheNextBigThing
  • 345
  • 1
  • 3
  • 10
  • 2
    @ajreal: I doubt any of the DOM methods can detect www.example.com text patterns. – mario Dec 10 '11 at 15:55
  • @mario I think the bigger problem is choosing wrong method. Do a xquery to get all anchor tag is minimize the complexity. I knew you are good at regex, maybe you can give some advice? – ajreal Dec 10 '11 at 16:01
  • 1
    @ajreal: The *input* isn't HTML. It's plain text. - It's a dupe question, no doubt. But difficult to google. OPs problem is that he doesn't have real URLs, but just domain names. He does need a regex with address bar magic. – mario Dec 10 '11 at 16:04
  • This question has been asked and answered before (e.g. See: [PHP Linkify Links In Content](http://stackoverflow.com/q/5080826/433790)). You need to avoid already linked URLs and need to make sure the URLs are valid before you put them into the `href` attribute. e.g. Putting `example.com` in a link won't work - you need to put in `http://example.com` (by itself, `example.com` is treated as a path, not a domain host). This problem is not trivial and there are a lot of 'gotchas'. See: [The Problem with URLS](http://www.codinghorror.com/blog/2008/10/the-problem-with-urls.html). – ridgerunner Dec 10 '11 at 18:59

2 Answers2

1
$text = preg_replace('@((?:http(?:s)?://)?(?:www)?([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $text);
haynar
  • 5,961
  • 7
  • 33
  • 53
  • Your answer could use some elaboration. Explain how did you get to this specific pattern. – Madara's Ghost Dec 10 '11 at 18:14
  • There was a small issue, I've corrected it. – haynar Dec 10 '11 at 18:20
  • at first checking the presence of `http://` or `https://` to match both http://www.example.com and www.example.com, then checking the presence of `www` string to match both www.example.com and example.com, then comes the standard pattern for URL consisting of any number of letters, dash and dot, then maybe port number exists and at the end checking the other part of the URL (after domain name) – haynar Dec 10 '11 at 18:26
1

Can't find a good duplicate now, so try something simple like repeating the prefix:

 /\b(\w[\w-]+\.)+[a-z]{2,}\b/i

I wouldn't use this; too many false positives. But you haven't really limited the scope. Alternatives include e.g. a fixed list of TLDs to make it a bit more specific.

Community
  • 1
  • 1
mario
  • 144,265
  • 20
  • 237
  • 291
  • Thank-you, I'm using this expression but with an added space before the \b. $count = null; $htmlified = preg_replace('/ \b([\w-]+\.)+([a-z]{2,})\b/i', '$1$2', 'THANK YOU EVERYONE you WIN.com', -1, $count); – TheNextBigThing Dec 10 '11 at 16:32