0

I am using the following to automatically add tags to any detected URL in a comment, before insertion into the database.

$pattern = "@\b(https?://)?(([0-9a-zA-Z_!~*'().&=+$%-]+:)?[0-9a-zA-Z_!~*'().&=+$%-]+\@)?(([0-9]{1,3}\.){3}[0-9]{1,3}|([0-9a-zA-Z_!~*'()-]+\.)*([0-9a-zA-Z][0-9a-zA-Z-]{0,61})?[0-9a-zA-Z]\.[a-zA-Z]{2,6})(:[0-9]{1,4})?((/[0-9a-zA-Z_!~*'().;?:\@&=+$,%#-]+)*/?)@";

$text_with_hyperlink = stripslashes(preg_replace($pattern, '<a href="\0" class="oembed">\0</a>', $body));

Everything works great apart from the fact that I wish any URL's that are typed without 'http://' to have it added to the beginning of the url.

e.g.

With the above code a comment containing 'come visit our site http://www.facebook.com'

returns come visit our site <a href="http://www.facebook.com">http://www.facebook.com</a>

However if a user types 'come visit our site www.facebook.com'

I wish it to return the url complete with an http:// prefix.

How would I go about modifying my code to produce this kind of detection?

EDIT: My apologies for failing to mention originally the the solution should also be capabale of detecting non www. domains such as m.facebook or facebook.com ideally.

gordyr
  • 6,078
  • 14
  • 65
  • 123
  • http://stackoverflow.com/questions/910912/extract-urls-from-text-in-php – Zul Jan 23 '12 at 13:08
  • and this: http://stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior – Zul Jan 23 '12 at 13:09
  • and this http://css-tricks.com/snippets/php/find-urls-in-text-make-links/ – Zul Jan 23 '12 at 13:10
  • Wrong, wrong and wrong. Have you even read the question? – Madara's Ghost Jan 23 '12 at 13:19
  • Zulkhaery, thanks for the suggestions but none of these solutions seem to detect the presence of the http:// prefix and add it if it is omitted which is what I need. They do however offer other (and potentially better) methods of detecting the URL. So thank you for that. – gordyr Jan 23 '12 at 13:20

2 Answers2

1

One quick and dirty solution would be to replace www.? with http://www.? As follows:

$text_with_hyperlink = preg_replace("|(?<!http://)(www\.\S+)|", "http://$1", $text_with_hyperlink);

Place it before the <a> adding code, it will transform all www.links.com to http://www.links.com.

Madara's Ghost
  • 172,118
  • 50
  • 264
  • 308
  • 1
    The only problem with that is if the user doesn't include the www (e.g. facebook.com) or has another subdomain (e.g. www1.facebook.com or m.facebook.com). Ultimately, what should be done is a preg match to see if the string is a URL and then check to see if the first part matches http or https, and if not, add it. – imkingdavid Jan 23 '12 at 13:31
  • Yes well, this is endless. You need to set a standard and follow it, otherwise I'll always be able to find another tweak to break the regex. – Madara's Ghost Jan 23 '12 at 13:36
  • Thanks Truth... Your solution does indeed work exactly as requested in the question. Howevever, I will wait an hour or two before marking it as answered (Stefans answer performs the same as your own) in the hope that someone can offer a solution which does take in to account non www. domains. My apologies for not being more clear in the question, it was an oversight. – gordyr Jan 23 '12 at 14:01
  • The thing is, it's impossible to determine whether a URL is valid or not without visiting it, and that might open the door to a world of hurt. You can't know whether facebook.com is real just as much as you can't know if dsfjsdfsd.co.me is real. – Madara's Ghost Jan 23 '12 at 14:05
  • Indeed... And this is a point I am taking seriously. Encapsulating all variations of url's will not be possible. However I certainly would like to include the more common forms such as stackoverflow.com and m.facebook.com etc. if at all possible. – gordyr Jan 23 '12 at 14:06
  • Well what you could do, is allow users to easily force links (SO uses `<>` to do that , ``), you could implement something similar and instruct your users to use it if they want links in their question. – Madara's Ghost Jan 23 '12 at 14:10
  • An interesting and simple solution that unfortunately isn't quite the direction I want to go in... That said however. I've slightly modified your and Stefan's answers to include the detection of several other types of URL (most notably m.facebbok.com etc.) which I feel is sufficient. Anything more would be unrealistic using these methods. Although both ansers given here do the job along with my minor modification, I will be awarding you the answer as you beat Stefan to post. Many thanks. – gordyr Jan 23 '12 at 14:14
1

Maybe this here is what you are looking for: http://snippets.dzone.com/posts/show/6156

//Edit: What about this one:

<?php
$body = $_GET['body'];
$pattern = "/(\\s+)((?:[a-z][a-z\\.\\d\\-]+)\\.(?:[a-z][a-z\\-]+))(?![\\w\\.])/is";
$text_with_hyperlink = preg_replace($pattern, '<a href="http://\\0" class="oembed">\0</a>', $body);
$text_with_hyperlink = preg_replace("/(http)(:)(\\/)(\\/)(\\s+)/is", "http://", $text_with_hyperlink);
echo $text_with_hyperlink;
?>

(Very dirty, i know...)

Stefan
  • 2,164
  • 1
  • 23
  • 40
  • Thanks stefan... Please see my coment on Truth's answer as it applies to your own also. Thanks for the good suggestion however. – gordyr Jan 23 '12 at 14:02
  • use this as regex: $regex='((?:[a-z][a-z\\.\\d\\-]+)\\.(?:[a-z][a-z\\-]+))(?![\\w\\.])'; – Stefan Jan 23 '12 at 14:51
  • sorry the last one causes delimiter problems, this one works for me: "((?:[a-z][a-z\\.\\d\\-]+)\\.(?:[a-z][a-z\\-]+))(?![\\w\\.])/is" – Stefan Jan 23 '12 at 15:09
  • This actually works perfectly in all cases apart from when the user types the http://. in such a case I get the http:// twice... once as text in the text string and a second time within the url. Very close however! – gordyr Jan 23 '12 at 15:10
  • Ahh.. just seen your edit. It detects all domains correctly but doesn't add the http:// if it is missing.. If I add the http:// into the 'template' then I get the double http:// problem. You really are so close stefan :-) – gordyr Jan 23 '12 at 15:23
  • what about just checking if theres a whitespace before the adress, so it will auto-link this one: xyz.com but not this one http://xyz.com so u can use another function for URLs with "http://" The regex would then be: /(\\s+)((?:[a-z][a-z\\.\\d\\-]+)\\.(?:[a-z][a-z\\-]+))(?![\\w\\.])/is – Stefan Jan 23 '12 at 15:23
  • Your last edit adds the http:// in when it is omitted from the text... but doesn't create the link when it is added. I really appreciate the help though. You've really given it a good attempt. Many thanks Stefan! – gordyr Jan 23 '12 at 19:08