URL detection problems

Question

i'm currently having some problems with detecting urls and making them clickable. Until now it always worked fine, probably because we always tested this with real urls, but now the website is live, we're having some problems.

This was the code we used to detect them before

$content = preg_replace('!(((f|ht)tp://)[-a-zA-Zа-яА-Я()0-9@:%_+.~#?&;//=]+)!i', '<a href="$1" target="_blank">$1</a>', $content);
$content = eregi_replace('([[:space:]()[{}])(www.[-a-zA-Z0-9@:%_\+.~#?&//=]+)', '\\1<a href="\\2" target="_blank">\\2</a>', $content);

It was doing a great job for normal urls, but some urls are giving problems:

- hk.linkedin.com
- www.test.com
- test.com

Also notice that some urls don't have http in fron of them.

I'm really not that good with regex, so I would very much appreciate it if somebody could help me figure this out.

Why don't you tell a bit what are you trying to achieve, and what's the "problem" you are facing? — Adrian Shum, Nov 16 '11 at 11:32
I always remove `http://` from any URL I send through a regex pattern. As it is either exact or non existent it can be done with a simple `str_replace('http://','',$url)` before you run it through the pattern. Note: this will remove `http://` from any `urlencoded()` strings passed in the URL. — David Barker, Nov 16 '11 at 11:37
But i can't figure out how to detect links like hk.linkedin.com or just linkedin.com — woutr_be, Nov 16 '11 at 12:33
Similar to http://stackoverflow.com/questions/910912/extract-urls-from-text-in-php and http://stackoverflow.com/questions/7769065/remove-urls-from-text-string/7769903#7769903 — Herbert, Nov 16 '11 at 12:55

score 0 · Answer 1 · answered Nov 16 '11 at 12:44

What exactly you wanted to get. In this example, I can see blatant lack of understanding for regular expressions... but then, I see this exact code used in few codes according to Google Code Search. But those were made to find URLs in middle of text (not always what looks like URL is URL, but if it contains http:// or www it's sure that's URL.

Not everything needs to be done only using regular expressions. Those are helpful, but sometimes they make additional problems.

One of problems in regular expressions is that they don't have conditionals on result. You can use multiple regular expressions, but there is chance that something will be done wrongly (like affecting what previous regular expression has done). Just look at this. It assigns additional function (you can use e modifier, but it may make code unreadable).

<?php
$content = preg_replace_callback('{\b(?:(https?|ftp)://)?(\S+[.]\S+)\b}i',
                                 'addHTTP', $content);
function addHTTP($matches) {
    if(empty($matches[1])) {
        return '<a href="http://' . $matches[2] . '">http://' . $matches[2] . '</a>';
    }
    else {
        return '<a href="' . $matches[2] . '">' . $matches[2] . '</a>';
    }
}

Or two regular expressions (little harder to understand)...

$content = preg_replace('{\b(?:(?:https?|ftp)://)\S+[.]\S+\b}i',
                        '<a href="$0">$0</a>', $content);
$content = preg_replace('{\b(?<!["\'=><.])[-a-zA-Zа-яА-Яа-яА-Я()0-9@:%_+.~#?&;//=]+[.][-a-zA-Zа-яА-Яа-яА-Я()0-9@:%_+.~#?&;//=]+(?!["\'=><.])\b}i',
                        '<a href="http://$0">http://$0</a>', $content);

Also, you should avoid using target="". Users don't expect that new window will appear when clicking the link. After user will click such link he might wonder why "Go left" button doesn't work (hint: new window caused it to disappear). If somebody really wants to open link in new window he will do it yourself (it's not hard...).

Note that usually such stuff is linked with other helpers like this. For example, Stack Overflow uses some kind of Markdown modification which does more intelligent renaming, like changing plain text lists to HTML lists... But that all depends on what you need. If you only need processing links, you can try using those regexpes, but well...

Yes indeed, I totally have lack of knowledge of regex and tried to google my problem and used that code. — woutr_be, Nov 16 '11 at 13:07

URL detection problems

1 Answers1