15

I want to match a URL link in a wall post and replace this link with anchor tag. For this I use the regular expression below.

I would like the match four types of URL:

  1. http://example.com
  2. https://example.com
  3. www.example.com
  4. example.com
preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@',
             '<a href="$1">$1</a>', $subject);

This expression matches only first two types of URL.

If I use this expression for matching a URL pattern, '@(www?([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', it only matches the third type of URL pattern.

How can I match all four typeS of URL patternS with a single regular expression?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Seema
  • 787
  • 2
  • 14
  • 25

8 Answers8

18

A complete working example using Nev Stokes' given link:

public function clickableUrls($html){
    return $result = preg_replace(
        '%\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s',
        '<a href="$1">$1</a>',
        $html
    );
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Mārtiņš Briedis
  • 17,396
  • 5
  • 54
  • 76
  • my goodness, finally this one works... i've been trying all kinds of ones that people have posted, either have trouble with the syntax or they partially work (what i needed to fix was that there were periods at the end of the url that were being picked up, like t.co/123213...) – kn00tcn Jan 29 '13 at 07:11
  • Worked perfectly. – ABCD Jan 27 '20 at 18:35
18

I'd use a different regex to be honest. Like this one that Gruber posted in 2009:

\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))

Or this updated version that Gruber posted in 2010 (thanks, @IMSoP):

(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Nev Stokes
  • 9,051
  • 5
  • 42
  • 44
  • 2
    Note that there is a newer version of that regex here: http://daringfireball.net/2010/07/improved_regex_for_matching_urls – IMSoP Aug 18 '12 at 17:07
  • 2
    Implemented in PHP: [http://stackoverflow.com/a/10002262/1055533](http://stackoverflow.com/a/10002262/1055533) – Oskar Aug 26 '13 at 21:05
2

I looked around and didn't see any that were exactly what I needed. I found this one that was close, so I modified it as follows:

^((([hH][tT][tT][pP][sS]?)\:\/\/)?([\w\\-]+(\[\w\.\&%\$\-]+)*)?((([^\s\(\)\<\>\\\"\.\   [\]\,;:]+)(\.[^\s\(\)\<\>\\\"\.\[\]\,;:]+)*(\.[a-zA-Z]{2,4}))|((([01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}([01]?\d{1,2}|2[0-4]\d|25[0-5])))(\b\:(6553[0-5]|655[0-2]\d|65[0-4]\d{2}|6[0-4]\d{3}|[1-5]\d{4}|[1-9]\d{0,3}|0)\b)?((\/[^\/][\w\.\,\?\'\\\/\+&%\$#\=~_\-]*)*[^\.\,\?\"\'\(\)\[\]!;<>{}\s\x7F-\xFF])?)$

Check it out on debuggex.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
uxtx
  • 329
  • 1
  • 2
  • 9
2

Use:

preg_match("/^((https|http|ftp)\:\/\/)?([a-z0-9A-Z]+\.[a-z0-9A-Z]+\.[a-z0-9A-Z]+\.[a-zA-Z]{2,4}|[a-z0-9A-Z]+\.[a-z0-9A-Z]+\.[a-zA-Z]{2,4}|[a-z0-9A-Z]+\.[a-zA-Z]{2,4})$/i", $url)
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Aldo Bassanini
  • 475
  • 4
  • 16
1

I just checked this post (after two years). It might be you got the answer, but for those who are beginners, you can use a regular expression to strip every type of URL or query string

(https|http|ftp)\:\/\/|([a-z0-9A-Z]+\.[a-z0-9A-Z]+\.[a-zA-Z]{2,4})|([a-z0-9A-Z]+\.[a-zA-Z]{2,4})|\?([a-zA-Z0-9]+[\&\=\#a-z]+)

It will strip every type of URLs. Take a look at the following list. I used a different type of domains for those who want to ask "will it strip .us, .in, .pk, etc.? type of domains or not.

  1. ftp://www.web.com
  2. web.net
  3. www.website.info
  4. website.us
  5. web.ws?query=true
  6. www.web.biz?query=true
  7. ftp://web.in?query=true
  8. media.google.com
  9. ns.google.pk
  10. ww1.smart.au
  11. www3.smart.br
  12. w1.smart.so
  13. ?ques==two&t=p
  14. http://website.info?ques==two&t=p
  15. https://www.weborwebsite.com

Working Example (tested in PHP5+, Apache2+):

$str = "ftp://www.web.com, web.net, www.website.info, website.us, web.ws?query=true, www.web.biz?query=true, ftp://web.in?query=true, media.google.com hello world, working more with ns ns.google.pk or ww1.smart.au and www3.smart.br w1.smart.so ?ques==two&t=p http://website.info?ques==two&t=p https://www.weborwebsite.com and ftp://www.hotmail.br";
echo preg_replace("/(https|http|ftp)\:\/\/|([a-z0-9A-Z]+\.[a-z0-9A-Z]+\.[a-zA-Z]{2,4})|([a-z0-9A-Z]+\.[a-zA-Z]{2,4})|\?([a-zA-Z0-9]+[\&\=\#a-z]+)/i", "", $str);

it will return

, , , , , , , hello world, working more with ns or and and
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Adnan
  • 1,379
  • 2
  • 17
  • 24
0

Use this pattern:

$regex = "(https?\:\/\/|ftp\:\/\/|www\.|[a-z0-9-]+)+([a-z0-9-]+)\.+([a-z]{2,4})((\/|\.)+([a-z0-9-_.\/]*)$|$)";
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
M Rostami
  • 4,035
  • 1
  • 35
  • 39
0

If you want to make that one work, you need to make the "https?//" part optional. Since you seem to have a fairly good grasp of regexps I won't show you. It is an exercise for the reader :)

But I generally agree with Nev. It's overly complicated for what it does.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
dutt
  • 7,909
  • 11
  • 52
  • 85
0

This works great for me - including mailto check:

function LinkIt($text)
{
    $t = preg_replace("/(\b(?:(?:http(s)?|ftp):\/\/|(www\.)))([-a-züöäß0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|])/im", '<a target="_blank" href="http$2://$3$4" class="external-link" title="External Link">$1$4</a>', $text);
    return preg_replace("/([\w+\.\-]+@[\w+\-]+\.[a-zA-Z]{2,4})/im", strtolower('<a href="mailto:$1" class="mail" title="E-Mail">$1</a>'), $t);
}