0

I'm having a bit of a problem with converting plain text to an url. What I like to have is, if I have text like this: www.google.com, it's converted to

<a href="www.google.com" target="_blank">www.google.com</a>

I'm kind of a RegEx noob, but I tried this:

$description = preg_replace('@(www.([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="$1" target="_blank">$1</a>', $description);

The description var is a piece of text, which CAN contain unconverted url's.

With the code above, I get this as link:

<a target="_blank">www.google.com</a>

So the href part is left out. This must be a piece of cake for you RegEx wizards out there, so thanks in advance for every help.

If there is another (better?) way to convert plain text to url's, you can say so and I'll try it.

samn
  • 2,699
  • 4
  • 20
  • 24
  • I've tried running your code and it does work perfectly. Which php version are you using? – Roberto Feb 23 '12 at 10:01
  • 1
    Can you post an example value for `$description`? – Roberto Feb 23 '12 at 10:13
  • Here you go: En je bent overal welkom als je maar breeddenkend bent!" Tempo (www.temponieuwsbrief.be) mocht op kotbezoek! – samn Feb 23 '12 at 10:14
  • Either you found a bug in PHP or you're not debugging correctly. That text does work in PHP 5.3.3, 5.3.6 and 5.3.10. Run the contents of http://pastebin.com/YqqQRSnV on its file and let me know if that works. – Roberto Feb 23 '12 at 10:18
  • 1
    i'm not a PHP guy but I fail to see how this could be regex issue. Your replacement string is static and has href in it, so how could regex remove it? must be downstream. – Scott Weaver Feb 23 '12 at 10:20
  • Here is a very concise answer: http://stackoverflow.com/a/1188652/851498 – Florian Margaine Feb 23 '12 at 10:21
  • Okay, you can see the problem here: http://cap47fb.com/hub/youtube/. In the first large chunk of text, the conversion is perfect. When you click on the first image, a green box should show up with the same piece of text. In this text the conversion in the URL is not right. The RegEx code is exactly the same. Also, when the box is closed, the first link doesn't work anymore... – samn Feb 23 '12 at 10:33

4 Answers4

2

If your only problem is that the link incorrectly points towards www.google.com instead of the fully qualified URL, such as http://www.google.com, then the correct replacement would be:

$description = preg_replace('@(www.([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="http://$1" target="_blank">$1</a>', $description);
Roberto
  • 1,944
  • 1
  • 30
  • 42
1

<a href="www.example.com">www.example.com</a> will not work correctly in modern browsers because the href value will be just appended to the current page url, e.g. http://example.com/www.example.com. You need to specify the protocol, ie. http/https, etc.

The following will replace all text "links" starting with ftp, http, https and file with html a tags

<?php

    $pattern = '/(www|ftp|http|https|file)(:\/\/)?[\S]+(\b|$)/i';
    $string = 'hello http://example.com https://graph.facebook.com    http://www.example.com www.google.com';

    function create_a_tags( $matches ){

        $url = $matches[0];
        if ( 'www' == $matches[1] ){
            $url = 'http://' . $matches[0];
        }
        $escaped = htmlspecialchars($matches[0]);
        return sprintf( '<a href="%s">%s</a>', $url, $escaped );
    }

    echo preg_replace_callback( $pattern, 'create_a_tags', $string );

?>

prints

hello <a href="http://example.com">http://example.com</a>
<a href="https://graph.facebook.com">https://graph.facebook.com</a>
<a href="http://www.example.com">http://www.example.com</a>
<a href="http://www.google.com">www.google.com</a>
rodneyrehm
  • 13,442
  • 1
  • 40
  • 56
scibuff
  • 13,377
  • 2
  • 27
  • 30
  • But what if the text is like this: www.google.com, and I want to get it like this: ? – samn Feb 23 '12 at 10:48
  • I've edited the code above to handle www urls as well (by adding http:// to the href attribute) but it may now create some false positives (I haven't tested it) – scibuff Feb 23 '12 at 12:18
0

Quite a while ago we compared different approaches to URL verification and identification. See the table of regular expressions.

I suggest you drop your regex and use the gruber revised instead. A (PHP 5.3) solution could look like:

<?php

$string = 'hello 
http://example.com 
https://graph.facebook.com 
http://www.example.com
www.google.com
ftp://example.com';

$string = preg_replace_callback('#(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))#iS', function($m) {
    // use http as default protocol, if none given
    if (strpos($m[0], '://') === false) {
        $m[0] = 'http://' . $m[0];
    }
    // text -> html is a context switch, take care of special characters
    $_m = htmlspecialchars($m[0]);
    return '<a href="' . $_m . '" target="_blank">' . $_m . '</a>';
}, $string);

echo $string, "\n";
rodneyrehm
  • 13,442
  • 1
  • 40
  • 56
  • There isn't anything fundamentally wrong with the regex he's currently using - the generated markup doesn't look to be valid (not scheme on the href) – AD7six Feb 23 '12 at 12:14
  • I never said there was anything wrong with his regex. I just explained there's a better one. Also, this solution is the only one sanitizing the URL for use in HTML. Something I do think is important to mention. If you're interested only in answering the core question without looking at the bigger picture - be my guest and downvote all you want… – rodneyrehm Feb 23 '12 at 12:20
  • It's not compiling well, I get this error: Parse error: syntax error, unexpected T_CONSTANT_ENCAPSED_STRING (on $string = preg_replace_callback('#(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z...) – samn Feb 23 '12 at 12:53
  • rodneyrehm - there is "bigger picture" and then there's misdirection. – AD7six Feb 23 '12 at 13:21
0

I've found the solution. It indeed didn't have anything to do with the RegEx, that was correct. My coworker added this line of jquery code in the head:

$("a").removeAttr('href');

So obviously the href attribute was being removed. I didn't look at this because I was sure this was a php/regex problem. Removing this fixed the problem.

I realize this was a stupid error and it was impossible for you to solve this, so thanks all for helping, +1 to you guys.

samn
  • 2,699
  • 4
  • 20
  • 24