Regular expression to find URLs within a string

Question

Does anyone know of a regular expression I could use to find URLs within a string? I've found a lot of regular expressions on Google for determining if an entire string is a URL but I need to be able to search an entire string for URLs. For example, I would like to be able to find www.google.com and http://yahoo.com in the following string:

Hello www.google.com World http://yahoo.com

I am not looking for specific URLs in the string. I am looking for ALL of the URLs in the string which is why I need a regular expression.

For PHP: `preg_match_all('#\bhttps?://[^\s()<>]+(?:$[\w\d]+$|([^[:punct:]\s]|/))#', $string, $match);` from https://stackoverflow.com/q/910912/1066234 — Avatar, Aug 06 '21 at 05:03

score 286 · Answer 1 · edited Oct 03 '21 at 13:11

286

This is the one I use

(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])

Works for me, should work for you too.

edited Oct 03 '21 at 13:11

Adam

6,041
36
120
208

answered May 18 '11 at 08:37

Rajeev

4,571
2
22
35

The inclusion of `&` in the expression is fishy. Where is this supposed to be used in? – nhahtdh Jul 30 '15 at 17:34
12

Don't forget to escape the forward slashes. – Mark Jul 08 '17 at 06:53
3

It's 2017, and unicode domain names are all over the place. `\w` may not match international symbols (depends on regex engine), the range is needed instead: `a-zA-Z0-9\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF`. – Michael Antipin Aug 29 '17 at 13:34
4

This is fine for general purpose, but there are many cases that it doesn't catch. This enforces that your links are prefixed with a protocol. If choose to ignore protocols, endings of emails are accepted as it is the case with test@testing.com. – Squazz Sep 07 '17 at 08:09
7

shouldn't `[\w_-]` be `[\w-]`? because `\w` matches `_` already. per [mozilla docs](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#special-word) – Sang Nov 04 '17 at 07:19
The answer provided by @Stefan Henze works much better (read: detects more valid URLs) than this one. – rinogo Nov 09 '17 at 14:56
11

Upvoted but This answer does not work what the question is asking `www.yahoo.com`. `"""(http|ftp|https)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?""".r.findAllIn("www.google.com").toList` . ALSO LACKS EXPLANATION for answer – prayagupa Nov 11 '17 at 23:58
2

I did a slight change to allow http://, https:// or ftp:// to be optional: `^((http|ftp|https)://)?([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?` – Justin Levene Apr 19 '18 at 18:14
Using the regex provided by @JustinLevene did not have the proper escape sequences on the back-slashes. Updated to now be correct, and added in condition to match the FTP protocol as well: Will match to all urls with or without protocols, and with out without "www." `^((http|ftp|https):\/\/)?([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])?` – Justin E. Samuels Oct 16 '18 at 17:19
1

Updated to include the `://` in the returned string: `(http|ftp|https)(:\/\/)([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?` – Zach Feb 04 '19 at 13:23
`((http|ftp|https)\:\/\/)?([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?` https://regex101.com/r/4XfRNi/2 – mplungjan Jun 15 '20 at 08:32
Awesome, thanks. Imho the `(?:(?:\.[\w_-]+)+)` should be optional, though, in order to catch links like `http://localhost/foo` (or any other hostname within the local search domain). Also I have removed the `?` from the last character range as I think it is more likely to be a normal punctuation character. – Roben Mar 12 '21 at 16:33
2

The problem here is someone can write shorthanded URLS, such as "example.com" – Joseph Kreifels II Mar 19 '21 at 18:56

score 67 · Answer 2 · answered Mar 26 '15 at 21:08

67

Guess no regex is perfect for this use. I found a pretty solid one here

/(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#\/%=~_|$?!:,.])*(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[A-Z0-9+&@#\/%=~_|$])/igm

Some differences / advantages compared to the other ones posted here:

It does not match email addresses
It does match localhost:12345
It won't detect something like moo.com without http or www

See here for examples

answered Mar 26 '15 at 21:08

Stefan Henze

2,711
23
22

7

it matches www.e This is not a valid url – Ihor Herasymchuk Dec 20 '16 at 22:46
3

The `g` option isn't valid in all regular expression implementations (e.g. Ruby's built-in implementation). – Huliax Jan 17 '20 at 13:23
This one worked a lot better, thanks! – Cake Princess Feb 27 '23 at 09:10

score 47 · Answer 3 · edited Dec 22 '20 at 18:46

47

text = """The link of this question: https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string
Also there are some urls: www.google.com, facebook.com, http://test.com/method?param=wasd, http://test.com/method?param=wasd&params2=kjhdkjshd
The code below catches all urls in text and returns urls in list."""

urls = re.findall('(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-&?=%.]+', text)
print(urls)

Output:

[
    'https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string', 
    'www.google.com', 
    'facebook.com',
    'http://test.com/method?param=wasd',
    'http://test.com/method?param=wasd&params2=kjhdkjshd'
]

edited Dec 22 '20 at 18:46

nicolasassi

490
4
14

answered Feb 13 '18 at 14:56

GooDeeJAY

1,681
2
20
27

Kotlin val urlRegex = "(?:(?:https?|ftp):\\/\\/)?[\\w/\\-?=%.]+\\.[\\w/\\-?=%.]+" – Akshay Nandwana Feb 27 '19 at 06:32
2

Misses `&` parameters in the url. e.g. `http://test.com/method?param=wasd&param2=wasd2` misses param2 – TrophyGeek May 18 '19 at 21:38
1

also lacks support for URLs with # – nicolasassi Dec 22 '20 at 19:53
@TrophyGeek I think you just copied the regex from the first comment, and Akshay forgot to include the `&`. The right version would be: `val urlRegex = "(?:(?:https?|ftp):\\/\\/)?[\\w/\\-?=%.]+\\.[\\w/\\-&?=%.]+"` – Alec Jan 21 '22 at 17:16
1

This also thinks `hello...` is a URL – mathematics-and-caffeine Mar 23 '22 at 08:12

wongz · Answer 4 · 2021-02-24T02:38:13.953

Wrote one up myself:

let regex = /([\w+]+\:\/\/)?([\w\d-]+\.)*[\w-]+[\.\:]\w+([\/\?\=\&\#\.]?[\w-]+)*\/?/gm

It works on ALL of the following domains:

https://www.facebook.com
https://app-1.number123.com
http://facebook.com
ftp://facebook.com
http://localhost:3000
localhost:3000/
unitedkingdomurl.co.uk
this.is.a.url.com/its/still=going?wow
shop.facebook.org
app.number123.com
app1.number123.com
app-1.numbEr123.com
app.dashes-dash.com
www.facebook.com
facebook.com
fb.com/hello_123
fb.com/hel-lo
fb.com/hello/goodbye
fb.com/hello/goodbye?okay
fb.com/hello/goodbye?okay=alright
Hello www.google.com World http://yahoo.com
https://www.google.com.tr/admin/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
https://google.com.tr/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
http://google.com/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
ftp://google.com/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
www.google.com.tr/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
www.google.com/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
drive.google.com/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
https://www.example.pl
http://www.example.com
www.example.pl
example.com
http://blog.example.com
http://www.example.com/product
http://www.example.com/products?id=1&page=2
http://www.example.com#up
http://255.255.255.255
255.255.255.255
shop.facebook.org/derf.html

You can see how it performs here on regex101 and adjust as needed

Your regex missed this when I tested it. It only caught part of the URL: shop.facebook.org/derf.html — David Rector, Feb 23 '21 at 07:50
@DavidRector Thanks! You are absolutely correct. I have updated the regex string and regex101 url based on your feedback. Added a \. at the end of the second last pair of square brackets [ ] — wongz, Feb 24 '21 at 02:39
This also matches any string of the form `alphanum_char.alphanum_char`, for example, `a.r`, `b.4`, `7.e`, etc. These aren't valid URLs. — Princy, Jun 17 '21 at 23:24

Squazz · Answer 5 · 2017-09-07T07:59:22.497

None of the solutions provided here solved the problems/use-cases I had.

What I have provided here, is the best I have found/made so far. I will update it when I find new edge-cases that it doesn't handle.

\b
  #Word cannot begin with special characters
  (?<![@.,%&#-])
  #Protocols are optional, but take them with us if they are present
  (?<protocol>\w{2,10}:\/\/)?
  #Domains have to be of a length of 1 chars or greater
  ((?:\w|\&\#\d{1,5};)[.-]?)+
  #The domain ending has to be between 2 to 15 characters
  (\.([a-z]{2,15})
       #If no domain ending we want a port, only if a protocol is specified
       |(?(protocol)(?:\:\d{1,6})|(?!)))
\b
#Word cannot end with @ (made to catch emails)
(?![@])
#We accept any number of slugs, given we have a char after the slash
(\/)?
#If we have endings like ?=fds include the ending
(?:([\w\d\?\-=#:%@&.;])+(?:\/(?:([\w\d\?\-=#:%@&;.])+))*)?
#The last char cannot be one of these symbols .,?!,- exclude these
(?<![.,?!-])

Is there any way to make this javascript friendly? As named capturing groups are not fully functional there, so the protocol value check does not validate. — einord, Feb 07 '20 at 07:26

score 9 · Answer 6 · answered Dec 10 '17 at 05:43

I think this regex pattern handle precisely what you want

/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/

and this is an snippet example to extract Urls:

// The Regular Expression filter
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";

// The Text you want to filter for urls
$text = "The text you want  https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string to filter goes here.";

// Check if there is a url in the text
preg_match_all($reg_exUrl, $text, $url,$matches);
var_dump($matches);

score 8 · Answer 7 · answered Nov 12 '17 at 12:29

8

If you have to be strict on selecting links, I would go for:

(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

For more infos, read this:

An Improved Liberal, Accurate Regex Pattern for Matching URLs

answered Nov 12 '17 at 12:29

Tommaso Belluzzo

23,232
8
74
98

4

Don't do that. http://www.regular-expressions.info/catastrophic.html It'll kill your app... – Auric Nov 28 '17 at 19:22

score 7 · Answer 8 · answered Jun 22 '16 at 06:33

7

All of the above answers are not match for Unicode characters in URL, for example: http://google.com?query=đức+filan+đã+search

For the solution, this one should work:

(ftp:\/\/|www\.|https?:\/\/){1}[a-zA-Z0-9u00a1-\uffff0-]{2,}\.[a-zA-Z0-9u00a1-\uffff0-]{2,}(\S*)

answered Jun 22 '16 at 06:33

Duc Filan

6,769
3
21
26

2

Unicode characters were forbidden as per the RFC 1738 on URLs (http://www.faqs.org/rfcs/rfc1738.html). They would have to be percent encoded to be standards compliant - although I think it may have changed more recently - worth reading https://www.w3.org/International/articles/idn-and-iri/ – mrswadge Sep 07 '16 at 09:41
@mrswadge I just cover the cases. We're not sure if all people care about the standard. Thank you for your info. – Duc Filan Sep 12 '16 at 02:54
1

Only this one worked perfectly for me having urls such as "http://www.example.com" "www.exmaple.com" "https://example.com" "ftp://example.co.in" "http://www.exmaple.com/?q='me'" – Krissh Jan 30 '20 at 07:43

score 6 · Answer 9 · edited Mar 04 '18 at 13:10

6

I used below regular expression to find url in a string:

/(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/

edited Mar 04 '18 at 13:10

Xantium

11,201
10
62
89

answered Jan 19 '15 at 10:47

aditya

89
1
2

3

`[a-zA-Z]{2,3}` is really poor for matching TLD, see official list: https://data.iana.org/TLD/tlds-alpha-by-domain.txt – Toto Jan 19 '15 at 11:04

score 6 · Answer 10 · edited Jan 08 '19 at 07:23

6

I found this which covers most sample links, including subdirectory parts.

Regex is:

(?:(?:https?|ftp):\/\/|\b(?:[a-z\d]+\.))(?:(?:[^\s()<>]+|\((?:[^\s()<>]+|(?:\([^\s()<>]+\)))?\))+(?:\((?:[^\s()<>]+|(?:\(?:[^\s()<>]+\)))?\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))?

edited Jan 08 '19 at 07:23

Klaus Gütter

11,151
6
31
36

answered Jan 08 '19 at 06:37

Thilanka Bowala

61
1
3

When I tried this, the ends of sentences were marked as a match. In the above sentence, the last word "match" and the period were matched. – David Rector Feb 23 '21 at 07:55

bafsar · Answer 11 · 2017-11-12T12:26:48.650

4

Short and simple. I have not tested in javascript code yet but It looks it will work:

((http|ftp|https):\/\/)?(([\w.-]*)\.([\w]*))

Code on regex101.com

edited Nov 12 '17 at 12:26

answered Nov 12 '17 at 12:18

bafsar

1,080
2
16
16

1

I liked your regex because it was exactly what I was looking for: I needed to identify and strip URLs out of some text, not validate. Worked in rails. – Dagmar Aug 16 '19 at 05:17
@Dagmar I am glad to hear that :) – bafsar Aug 17 '19 at 02:18

score 4 · Answer 12 · answered Oct 16 '18 at 17:24

Using the regex provided by @JustinLevene did not have the proper escape sequences on the back-slashes. Updated to now be correct, and added in condition to match the FTP protocol as well: Will match to all urls with or without protocols, and with out without "www."

Code: ^((http|ftp|https):\/\/)?([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])?

Example: https://regex101.com/r/uQ9aL4/65

Mindaugas Jaraminas · Answer 13 · 2020-03-12T13:56:40.910

4

Here a little bit more optimized regexp:

(?:(?:(https?|ftp|file):\/\/|www\.|ftp\.)|([\w\-_]+(?:\.|\s*\[dot\]\s*[A-Z\-_]+)+))([A-Z\-\.,@?^=%&amp;:\/~\+#]*[A-Z\-\@?^=%&amp;\/~\+#]){2,6}?

Here is test with data: https://regex101.com/r/sFzzpY/6

edited Mar 12 '20 at 13:56

answered Mar 12 '20 at 13:45

Mindaugas Jaraminas

3,261
2
24
37

1

Your test shows some of your URL's are not being detected fully. This entire string should be marked as a match: https://stackoverflow.com/questions/60619430/cant-get-newest-data-from-dao-unless-restart-the-app – David Rector Feb 23 '21 at 07:53

Xantium · Answer 14 · 2018-02-06T19:59:32.277

A probably too simplistic, but working method might be:

[localhost|http|https|ftp|file]+://[\w\S(\.|:|/)]+

I tested it on Python and as long as the string parsing contains a space before and after and none in the url (which I have never seen before) it should be fine.

Here is an online ide demonstrating it

However here are some benefits of using it:

It recognises file: and localhost as well as ip addresses
It will never match without them
It does not mind unusual characters such as # or - (see url of this post)

score 3 · Answer 15 · answered Feb 21 '19 at 13:33

3

I use this Regex:

/((\w+:\/\/\S+)|(\w+[\.:]\w+\S+))[^\s,\.]/ig

It works fine for many URLs, like: http://google.com, https://dev-site.io:8080/home?val=1&count=100, www.regexr.com, localhost:8080/path, ...

answered Feb 21 '19 at 13:33

Sillas Camara

76
1
5

Dragana Le Mitova · Answer 16 · 2023-06-07T19:58:40.930

IMPROVED

Detects Urls like these:

https://www.example.pl
http://www.example.com
www.example.pl
example.com
http://blog.example.com
http://www.example.com/product
http://www.example.com/products?id=1&page=2
http://www.example.com#up
http://255.255.255.255
255.255.255.255
http:// www.site.com:8008

Regex:

/^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\$&'\(\)\*\+,;=.]+$/gm

Please note that working with URLs and domain validation can be complex, and regex alone may not cover all edge cases. For more comprehensive URL validation, it's recommended to use specialized libraries or built-in URL validation functions provided by your programming language or framework.

score 3 · Answer 17 · answered May 17 '11 at 22:54

3

If you have the url pattern, you should be able to search for it in your string. Just make sure that the pattern doesnt have ^ and $ marking beginning and end of the url string. So if P is the pattern for URL, look for matches for P.

answered May 17 '11 at 22:54

manojlds

290,304
63
469
417

This is the regex I found that verifies if an entire string is a URL. I took out the ^ at the beggining and the $ at the end like you said and it still didn't work. What am I doing wrong? `^(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*[^\.\,\)\(\s]$` – user758263 May 17 '11 at 23:19
It might help if you showed what language you're using. Either way, be sure to check `http://regexpal.com/`; there you can test different expressions against your string until you get it right. – entonio May 17 '11 at 23:37
@user758263 - do you really need such a complex regex for the url? Depends on what the possible urls you might actually find. Also see http://gskinner.com/RegExr/ for trying out regex. They also have hundreds of samples on the right under the `Community` tab including ones for urls – manojlds May 18 '11 at 00:06
I'm trying to look for all possible URLs and I'm using C++. Thanks for the links entonio and manojlds. The gskinner site was especially helpful since it had samples. – user758263 May 18 '11 at 15:11

ran8 · Answer 18 · 2020-07-04T01:09:37.847

I liked Stefan Henze 's solution but it would pick up 34.56. Its too general and I have unparsed html. There are 4 anchors for a url;

www ,

http:\ (and co) ,

. followed by letters and then / ,

or letters . and one of these: https://ftp.isc.org/www/survey/reports/current/bynum.txt .

I used lots of info from this thread. Thank you all.

"(((((http|ftp|https|gopher|telnet|file|localhost):\\/\\/)|(www\\.)|(xn--)){1}([\\w_-]+(?:(?:\\.[\\w_-]+)+))([\\w.,@?^=%&:\\/~+#-]*[\\w@?^=%&\\/~+#-])?)|(([\\w_-]{2,200}(?:(?:\\.[\\w_-]+)*))((\\.[\\w_-]+\\/([\\w.,@?^=%&:\\/~+#-]*[\\w@?^=%&\\/~+#-])?)|(\\.((org|com|net|edu|gov|mil|int|arpa|biz|info|unknown|one|ninja|network|host|coop|tech)|(jp|br|it|cn|mx|ar|nl|pl|ru|tr|tw|za|be|uk|eg|es|fi|pt|th|nz|cz|hu|gr|dk|il|sg|uy|lt|ua|ie|ir|ve|kz|ec|rs|sk|py|bg|hk|eu|ee|md|is|my|lv|gt|pk|ni|by|ae|kr|su|vn|cy|am|ke))))))(?!(((ttp|tp|ttps):\\/\\/)|(ww\\.)|(n--)))"

Above solves just about everything except a string like "eurls:www.google.com,facebook.com,http://test.com/", which it returns as a single string. Tbh idk why I added gopher etc. Proof R code

if(T){
  wierdurl<-vector()
  wierdurl[1]<-"https://JP納豆.例.jp/dir1/納豆 "
  wierdurl[2]<-"xn--jp-cd2fp15c.xn--fsq.jp "
  wierdurl[3]<-"http://52.221.161.242/2018/11/23/biofourmis-collab"
  wierdurl[4]<-"https://12000.org/ "
  wierdurl[5]<-"  https://vg-1.com/?page_id=1002 "
  wierdurl[6]<-"https://3dnews.ru/822878"
  wierdurl[7]<-"The link of this question: https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string
  Also there are some urls: www.google.com, facebook.com, http://test.com/method?param=wasd
  The code below catches all urls in text and returns urls in list. "
  wierdurl[8]<-"Thelinkofthisquestion:https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string
  Alsotherearesomeurls:www.google.com,facebook.com,http://test.com/method?param=wasd
  Thecodebelowcatchesallurlsintextandreturnsurlsinlist. "
  wierdurl[9]<-"Thelinkofthisquestion:https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-stringAlsotherearesomeurlsZwww.google.com,facebook.com,http://test.com/method?param=wasdThecodebelowcatchesallurlsintextandreturnsurlsinlist."
  wierdurl[10]<-"1facebook.com/1res"
  wierdurl[11]<-"1facebook.com/1res/wat.txt"
  wierdurl[12]<-"www.e "
  wierdurl[13]<-"is this the file.txt i need"
  wierdurl[14]<-"xn--jp-cd2fp15c.xn--fsq.jpinspiredby "
  wierdurl[15]<-"[xn--jp-cd2fp15c.xn--fsq.jp/inspiredby "
  wierdurl[16]<-"xnto--jpto-cd2fp15c.xnto--fsq.jpinspiredby "
  wierdurl[17]<-"fsety--fwdvg-gertu56.ffuoiw--ffwsx.3dinspiredby "
  wierdurl[18]<-"://3dnews.ru/822878 "
  wierdurl[19]<-" http://mywebsite.com/msn.co.uk "
  wierdurl[20]<-" 2.0http://www.abe.hip "
  wierdurl[21]<-"www.abe.hip"
  wierdurl[22]<-"hardware/software/data"
  regexstring<-vector()
  regexstring[2]<-"(http|ftp|https)://([\\w_-]+(?:(?:\\.[\\w_-]+)+))([\\w.,@?^=%&:/~+#-]*[\\w@?^=%&/~+#-])?"
  regexstring[3]<-"/(?:(?:https?|ftp|file):\\/\\/|www\\.|ftp\\.)(?:\\([-A-Z0-9+&@#\\/%=~_|$?!:,.]*\\)|[-A-Z0-9+&@#\\/%=~_|$?!:,.])*(?:\\([-A-Z0-9+&@#\\/%=~_|$?!:,.]*\\)|[A-Z0-9+&@#\\/%=~_|$])/igm"
  regexstring[4]<-"[a-zA-Z0-9\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]?"
  regexstring[5]<-"((http|ftp|https)\\:\\/\\/)?([\\w_-]+(?:(?:\\.[\\w_-]+)+))([\\w.,@?^=%&:/~+#-]*[\\w@?^=%&/~+#-])?"
  regexstring[6]<-"((http|ftp|https):\\/\\/)?([\\w_-]+(?:(?:\\.[\\w_-]+)+))([\\w.,@?^=%&:\\/~+#-]*[\\w@?^=%&\\/~+#-])?"
  regexstring[7]<-"(http|ftp|https)(:\\/\\/)([\\w_-]+(?:(?:\\.[\\w_-]+)+))([\\w.,@?^=%&:/~+#-]*[\\w@?^=%&/~+#-])?"
  regexstring[8]<-"(?:(?:https?|ftp|file):\\/\\/|www\\.|ftp\\.)(?:\\([-A-Z0-9+&@#/%=~_|$?!:,.]*\\)|[-A-Z0-9+&@#/%=~_|$?!:,.])*(?:\\([-A-Z0-9+&@#/%=~_|$?!:,.]*\\)|[A-Z0-9+&@#/%=~_|$])"
  regexstring[10]<-"((http[s]?|ftp):\\/)?\\/?([^:\\/\\s]+)((\\/\\w+)*\\/)([\\w\\-\\.]+[^#?\\s]+)(.*)?(#[\\w\\-]+)?"
  regexstring[12]<-"http[s:/]+[[:alnum:]./]+"
  regexstring[9]<-"http[s:/]+[[:alnum:]./]+" #in DLpages 230
  regexstring[1]<-"[[:alnum:]-]+?[.][:alnum:]+?(?=[/ :])" #in link_graphs 50
  regexstring[13]<-"^(?!mailto:)(?:(?:http|https|ftp)://)(?:\\S+(?::\\S*)?@)?(?:(?:(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[0-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-z\\u00a1-\\uffff]{2,})))|localhost)(?::\\d{2,5})?(?:(/|\\?|#)[^\\s]*)?$"
  regexstring[14]<-"(((((http|ftp|https):\\/\\/)|(www\\.)|(xn--)){1}([\\w_-]+(?:(?:\\.[\\w_-]+)+))([\\w.,@?^=%&:\\/~+#-]*[\\w@?^=%&\\/~+#-])?)|(([\\w_-]+(?:(?:\\.[\\w_-]+)*))((\\.((org|com|net|edu|gov|mil|int)|(([:alpha:]{2})(?=[, ]))))|([\\/]([\\w.,@?^=%&:\\/~+#-]*[\\w@?^=%&\\/~+#-])?))))(?!(((ttp|tp|ttps):\\/\\/)|(ww\\.)|(n--)))"
  regexstring[15]<-"(((((http|ftp|https|gopher|telnet|file|localhost):\\/\\/)|(www\\.)|(xn--)){1}([\\w_-]+(?:(?:\\.[\\w_-]+)+))([\\w.,@?^=%&:\\/~+#-]*[\\w@?^=%&\\/~+#-])?)|(([\\w_-]{2,200}(?:(?:\\.[\\w_-]+)*))((\\.[\\w_-]+\\/([\\w.,@?^=%&:\\/~+#-]*[\\w@?^=%&\\/~+#-])?)|(\\.((org|com|net|edu|gov|mil|int|arpa|biz|info|unknown|one|ninja|network|host|coop|tech)|(jp|br|it|cn|mx|ar|nl|pl|ru|tr|tw|za|be|uk|eg|es|fi|pt|th|nz|cz|hu|gr|dk|il|sg|uy|lt|ua|ie|ir|ve|kz|ec|rs|sk|py|bg|hk|eu|ee|md|is|my|lv|gt|pk|ni|by|ae|kr|su|vn|cy|am|ke))))))(?!(((ttp|tp|ttps):\\/\\/)|(ww\\.)|(n--)))"
    }

for(i in wierdurl){#c(7,22)
  for(c in regexstring[c(15)]) {
    print(paste(i,which(regexstring==c)))
    print(str_extract_all(i,c))
  }
}

MTeveZ · Answer 19 · 2022-12-06T15:22:31.467

3

Wasn't easy one, but managed to compose a short and efficient regex pattern to match URLs, also captures email addresses. Hope that works for you.

((\bhttp(|s)|ftp|file):\/\/)|\bwww[ ]*\.[ ]*([a-zA-Z0-9%:?#@\/=_-]*)|([a-zA-Z0-9%:.?#@\/=_-]*)[ ]*\.[ ]*(com|eu|org|co|uk|pdf|etc)

This can be tested here regexr.com

edited Dec 06 '22 at 15:22

answered Nov 30 '22 at 09:55

MTeveZ

31
2

score 2 · Answer 20 · answered Apr 30 '20 at 11:43

I have utilize c# Uri class and it works, well with IP Address, localhost

 public static bool CheckURLIsValid(string url)
    {
        Uri returnURL;

       return (Uri.TryCreate(url, UriKind.Absolute, out returnURL)
           && (returnURL.Scheme == Uri.UriSchemeHttp || returnURL.Scheme == Uri.UriSchemeHttps));


    }

score 2 · Answer 21 · answered Mar 09 '21 at 16:31

2

This regex is perfectly working for me, should work for you too

(http|ftp|https)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?

answered Mar 09 '21 at 16:31

Muhammad Numan

23,222
6
63
80

score 1 · Answer 22 · edited Jun 20 '20 at 09:12

1

This is a slight improvement on/adjustment to (depending on what you need) Rajeev's answer:

([\w\-_]+(?:(?:\.|\s*\[dot\]\s*[A-Z\-_]+)+))([A-Z\-\.,@?^=%&amp;:/~\+#]*[A-Z\-\@?^=%&amp;/~\+#]){2,6}?

See here for an example of what it does and does not match.

I got rid of the check for "http" etc as I wanted to catch url's without this. I added slightly to the regex to catch some obfuscated urls (i.e. where user's use [dot] instead of a "."). Finally I replaced "\w" with "A-Z" to and "{2,3}" to reduce false positives like v2.0 and "moo.0dd".

Any improvements on this welcome.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jan 19 '15 at 10:43

avjaarsveld

579
6
9

`[a-zA-Z]{2,3}`is really poor for matching TLD, see official list: https://data.iana.org/TLD/tlds-alpha-by-domain.txt. Also your regex matches `_.........&&&&&&` not sure it's a valid url. – Toto Jan 19 '15 at 11:06
Thanks for that JE SUIS CHAELIE, any suggestions for improvement (especially for the false positive)? – avjaarsveld Jan 19 '15 at 16:31

score 0 · Answer 23 · answered Aug 26 '14 at 18:37

0

I use the logic of finding text between two dots or periods

the regex below works fine with python

(?<=\.)[^}]*(?=\.)

answered Aug 26 '14 at 18:37

faisal00813

400
5
12

score 0 · Answer 24 · answered Nov 03 '16 at 15:11

0

Matching a URL in a text should not be so complex

(?:(?:(?:ftp|http)[s]*:\/\/|www\.)[^\.]+\.[^ \n]+)

https://regex101.com/r/wewpP1/2

answered Nov 03 '16 at 15:11

naT erraT

502
1
5
19

score 0 · Answer 25 · edited Jan 10 '18 at 18:14

0

I used this

^(https?:\\/\\/([a-zA-z0-9]+)(\\.[a-zA-z0-9]+)(\\.[a-zA-z0-9\\/\\=\\-\\_\\?]+)?)$

edited Jan 10 '18 at 18:14

Panciz

2,183
2
30
54

answered Jan 10 '18 at 16:36

Maikon Ayres Da Silva

9
1

score 0 · Answer 26 · answered Dec 17 '19 at 23:29

(?:vnc|s3|ssh|scp|sftp|ftp|http|https)\:\/\/[\w\.]+(?:\:?\d{0,5})|(?:mailto|)\:[\w\.]+\@[\w\.]+

If you want an explanation of each part, try in regexr[.]com where you will get a great explanation of every character.

This is split by an "|" or "OR" because not all useable URI have "//" so this is where you can create a list of schemes as or conditions that you would be interested in matching.

Md. Jahurul Islam · Answer 27 · 2021-04-30T05:56:09.603

0

How about this one?

(http:\/\/|ftp:\/\/|https:\/\/|www\.)([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])?

It matches both in the question.

edited Apr 30 '21 at 05:56

answered Apr 30 '21 at 04:59

Md. Jahurul Islam

51
7

score 0 · Answer 28 · answered May 12 '21 at 12:19

This slightly simpler version of GooDeeJAY's answer serves me well (and supports e.g. # and other characters at the expense of increasing 'false positives'):

import re
text = """The link of this question: https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string
Also there are some urls: www.google.com, facebook.com, http://test.com/method?param=wasd, http://test.com/method?param=wasd&params2=kjhdkjshd#changed
The code below catches all urls in text and returns urls in list."""

regex = r"(?i)(https?://|www.|\w+\.)[^\s]+"
urls = [match.group() for match in re.finditer(regex, text)]
print(urls)

and outputs

[
'https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string', 
'www.google.com,', 
'facebook.com,', 
'http://test.com/method?param=wasd,', 
'http://test.com/method?param=wasd&params2=kjhdkjshd#changed'
]

Görkem Hacıoğlu · Answer 29 · 2021-09-03T10:44:17.940

0

This expression also finds paths like: /path/text.html

(https?\:\/[^\"\'\n\<\>\;\)\s]*)|(www?\.[^\"\'\n\<\>\;\s]*)|([^\s\&\=\;\,\<\<\>\"\'\(\)]+\/[\w\/])([^\"\'\n\;\s]*)|((?<!\<)[\/]+[\w]+[^\'\"\s\<\>]*)

edited Sep 03 '21 at 10:44

answered Sep 03 '21 at 07:09

Görkem Hacıoğlu

171
1
9

score 0 · Answer 30 · answered Jan 13 '22 at 19:20

This regex catches even domains like goo.gl or domains.google.

/(?:(?:((http|ftp|https):){0,1}\/\/)|www.|)([\w_-]+(?:(?:\.[\w_-]+)*))(\.(aaa|aarp|abarth|abb|abbott|abbvie|abc|able|abogado|abudhabi|ac|academy|accenture|accountant|accountants|aco|active|actor|ad|adac|ads|adult|ae|aeg|aero|aetna|af|afamilycompany|afl|africa|ag|agakhan|agency|ai|aig|aigo|airbus|airforce|airtel|akdn|al|alfaromeo|alibaba|alipay|allfinanz|allstate|ally|alsace|alstom|am|amazon|americanexpress|americanfamily|amex|amfam|amica|amsterdam|an|analytics|android|anquan|anz|ao|aol|apartments|app|apple|aq|aquarelle|ar|arab|aramco|archi|army|arpa|art|arte|as|asda|asia|associates|at|athleta|attorney|au|auction|audi|audible|audio|auspost|author|auto|autos|avianca|aw|aws|ax|axa|az|azure|ba|baby|baidu|banamex|bananarepublic|band|bank|bar|barcelona|barclaycard|barclays|barefoot|bargains|baseball|basketball|bauhaus|bayern|bb|bbc|bbt|bbva|bcg|bcn|bd|be|beats|beauty|beer|bentley|berlin|best|bestbuy|bet|bf|bg|bh|bharti|bi|bible|bid|bike|bing|bingo|bio|biz|bj|bl|black|blackfriday|blanco|blockbuster|blog|bloomberg|blue|bm|bms|bmw|bn|bnl|bnpparibas|bo|boats|boehringer|bofa|bom|bond|boo|book|booking|boots|bosch|bostik|boston|bot|boutique|box|bq|br|bradesco|bridgestone|broadway|broker|brother|brussels|bs|bt|budapest|bugatti|build|builders|business|buy|buzz|bv|bw|by|bz|bzh|ca|cab|cafe|cal|call|calvinklein|cam|camera|camp|cancerresearch|canon|capetown|capital|capitalone|car|caravan|cards|care|career|careers|cars|cartier|casa|case|caseih|cash|casino|cat|catering|catholic|cba|cbn|cbre|cbs|cc|cd|ceb|center|ceo|cern|cf|cfa|cfd|cg|ch|chanel|channel|charity|chase|chat|cheap|chintai|chloe|christmas|chrome|chrysler|church|ci|cipriani|circle|cisco|citadel|citi|citic|city|cityeats|ck|cl|claims|cleaning|click|clinic|clinique|clothing|cloud|club|clubmed|cm|cn|co|coach|codes|coffee|college|cologne|com|comcast|commbank|community|company|compare|computer|comsec|condos|construction|consulting|contact|contractors|cooking|cookingchannel|cool|coop|corsica|country|coupon|coupons|courses|cpa|cr|credit|creditcard|creditunion|cricket|crown|crs|cruise|cruises|csc|cu|cuisinella|cv|cw|cx|cy|cymru|cyou|cz|dabur|dad|dance|data|date|dating|datsun|day|dclk|dds|de|deal|dealer|deals|degree|delivery|dell|deloitte|delta|democrat|dental|dentist|desi|design|dev|dhl|diamonds|diet|digital|direct|directory|discount|discover|dish|diy|dj|dk|dm|dnp|do|docs|doctor|dodge|dog|doha|domains|doosan|dot|download|drive|dtv|dubai|duck|dunlop|duns|dupont|durban|dvag|dvr|dz|earth|eat|ec|eco|edeka|edu|education|ee|eg|eh|email|emerck|energy|engineer|engineering|enterprises|epost|epson|equipment|er|ericsson|erni|es|esq|estate|esurance|et|etisalat|eu|eurovision|eus|events|everbank|exchange|expert|exposed|express|extraspace|fage|fail|fairwinds|faith|family|fan|fans|farm|farmers|fashion|fast|fedex|feedback|ferrari|ferrero|fi|fiat|fidelity|fido|film|final|finance|financial|fire|firestone|firmdale|fish|fishing|fit|fitness|fj|fk|flickr|flights|flir|florist|flowers|flsmidth|fly|fm|fo|foo|food|foodnetwork|football|ford|forex|forsale|forum|foundation|fox|fr|free|fresenius|frl|frogans|frontdoor|frontier|ftr|fujitsu|fujixerox|fun|fund|furniture|futbol|fyi|ga|gal|gallery|gallo|gallup|game|games|gap|garden|gay|gb|gbiz|gd|gdn|ge|gea|gent|genting|george|gf|gg|ggee|gh|gi|gift|gifts|gives|giving|gl|glade|glass|gle|global|globo|gm|gmail|gmbh|gmo|gmx|gn|godaddy|gold|goldpoint|golf|goo|goodhands|goodyear|goog|google|gop|got|gov|gp|gq|gr|grainger|graphics|gratis|green|gripe|grocery|group|gs|gt|gu|guardian|gucci|guge|guide|guitars|guru|gw|gy|hair|hamburg|hangout|haus|hbo|hdfc|hdfcbank|health|healthcare|help|helsinki|here|hermes|hgtv|hiphop|hisamitsu|hitachi|hiv|hk|hkt|hm|hn|hockey|holdings|holiday|homedepot|homegoods|homes|homesense|honda|honeywell|horse|hospital|host|hosting|hot|hoteles|hotels|hotmail|house|how|hr|hsbc|ht|htc|hu|hughes|hyatt|hyundai|ibm|icbc|ice|icu|id|ie|ieee|ifm|iinet|ikano|il|im|imamat|imdb|immo|immobilien|in|inc|industries|infiniti|info|ing|ink|institute|insurance|insure|int|intel|international|intuit|investments|io|ipiranga|iq|ir|irish|is|iselect|ismaili|ist|istanbul|it|itau|itv|iveco|iwc|jaguar|java|jcb|jcp|je|jeep|jetzt|jewelry|jio|jlc|jll|jm|jmp|jnj|jo|jobs|joburg|jot|joy|jp|jpmorgan|jprs|juegos|juniper|kaufen|kddi|ke|kerryhotels|kerrylogistics|kerryproperties|kfh|kg|kh|ki|kia|kim|kinder|kindle|kitchen|kiwi|km|kn|koeln|komatsu|kosher|kp|kpmg|kpn|kr|krd|kred|kuokgroup|kw|ky|kyoto|kz|la|lacaixa|ladbrokes|lamborghini|lamer|lancaster|lancia|lancome|land|landrover|lanxess|lasalle|lat|latino|latrobe|law|lawyer|lb|lc|lds|lease|leclerc|lefrak|legal|lego|lexus|lgbt|li|liaison|lidl|life|lifeinsurance|lifestyle|lighting|like|lilly|limited|limo|lincoln|linde|link|lipsy|live|living|lixil|lk|llc|llp|loan|loans|locker|locus|loft|lol|london|lotte|lotto|love|lpl|lplfinancial|lr|ls|lt|ltd|ltda|lu|lundbeck|lupin|luxe|luxury|lv|ly|ma|macys|madrid|maif|maison|makeup|man|management|mango|map|market|marketing|markets|marriott|marshalls|maserati|mattel|mba|mc|mcd|mcdonalds|mckinsey|md|me|med|media|meet|melbourneand Innovation|meme|memorial|men|menu|meo|merckmsd|metlife|mf|mg|mh|miami|microsoft|mil|mini|mint|mit|mitsubishi|mk|ml|mlb|mls|mm|mma|mn|mo|mobi|mobile|mobily|moda|moe|moi|mom|monash|money|monster|montblanc|mopar|mormon|mortgage|moscow|moto|motorcycles|mov|movie|movistar|mp|mq|mr|ms|msd|mt|mtn|mtpc|mtr|mu|museum|music|mutual|mutuelle|mv|mw|mx|my|mz|na|nab|nadex|nagoya|name|nationwide|natura|navy|nba|nc|ne|nec|net|netbank|netflix|network|neustar|new|newholland|news|next|nextdirect|nexus|nf|nfl|ng|ngo|nhk|ni|nico|nike|nikon|ninja|nissan|nissay|nl|no|nokia|northwesternmutual|norton|now|nowruz|nowtv|np|nr|nra|nrw|ntt|nu|nycTelecommunications|nz|obi|observer|off|office|okinawa|olayan|olayangroup|oldnavy|ollo|om|omega|one|ong|onl|online|onyourside|ooo|open|oracle|orange|org|organic|orientexpress|origins|osaka|otsuka|ott|ovh|pa|page|pamperedchef|panasonic|panerai|paris|pars|partners|parts|party|passagens|pay|pccw|pe|pet|pf|pfizer|pg|ph|pharmacy|phd|philips|phone|photo|photography|photos|physio|piaget|pics|pictet|pictures|pid|pin|ping|pink|pioneer|pizza|pk|pl|place|play|playstation|plumbing|plus|pm|pn|pnc|pohl|poker|politie|porn|post|pr|pramerica|praxi|press|prime|pro|prod|productions|prof|progressive|promo|properties|property|protection|pru|prudential|ps|pt|pub|pw|pwc|py|qa|qpon|quebec|quest|qvc|racing|radio|raid|re|read|realestate|realtor|realty|recipes|red|redstone|redumbrella|rehab|reise|reisen|reit|reliance|ren|rent|rentals|repair|report|republican|rest|restaurant|review|reviews|rexroth|rich|richardli|ricoh|rightathome|ril|rio|rip|rmit|ro|rocher|rocks|rodeo|rogers|room|rs|rsvp|ru|rugby|ruhr|run|rw|rwe|ryukyu|sa|saarland|safe|safety|sakura|sale|salon|samsclub|samsung|sandvik|sandvikcoromant|sanofi|sap|sapo|sarl|sas|save|saxo|sb|sbi|sbs|sc|sca|scb|schaeffler|schmidt|scholarships|school|schule|schwarz|science|scjohnson|scor|scot|sd|se|search|seat|secure|security|seek|select|sener|services|ses|seven|sew|sex|sexy|sfr|sg|sh|shangrila|sharp|shaw|shell|shia|shiksha|shoes|shop|shopping|shouji|show|showtime|shriram|si|silk|sina|singles|site|sj|sk|ski|skin|sky|skype|sl|sling|sm|smart|smile|sn|sncf|so|soccer|social|softbank|software|sohu|solar|solutions|song|sony|soy|spa|space|spiegel|sport|spot|spreadbetting|sr|srl|srt|ss|st|stada|staples|star|starhub|statebank|statefarm|statoil|stc|stcgroup|stockholm|storage|store|stream|studio|study|style|su|sucks|supplies|supply|support|surf|surgery|suzuki|sv|swatch|swiftcover|swiss|sx|sy|sydney|symantec|systems|sz|tab|taipei|talk|taobao|target|tatamotors|tatar|tattoo|tax|taxi|tc|tci|td|tdk|team|tech|technology|tel|telecity|telefonica|temasek|tennis|teva|tf|tg|th|thd|theater|theatre|tiaa|tickets|tienda|tiffany|tips|tires|tirol|tj|tjmaxx|tjx|tk|tkmaxx|tl|tm|tmall|tn|to|today|tokyo|tools|top|toray|toshiba|total|tours|town|toyota|toys|tp|tr|trade|trading|training|travel|travelchannel|travelers|travelersinsurance|trust|trv|tt|tube|tui|tunes|tushu|tv|tvs|tw|tz|ua|ubank|ubs|uconnect|ug|uk|um|unicom|university|uno|uol|ups|us|uy|uz|va|vacations|vana|vanguard|vc|ve|vegas|ventures|verisign|versicherung|vet|vg|vi|viajes|video|vig|viking|villas|vin|vip|virgin|visa|vision|vista|vistaprint|viva|vivo|vlaanderen|vn|vodka|volkswagen|volvo|vote|voting|voto|voyage|vu|vuelos|wales|walmart|walter|wang|wanggou|warman|watch|watches|weather|weatherchannel|webcam|weber|website|wed|wedding|weibo|weir|wf|whoswho|wien|wiki|williamhill|win|windows|wine|winners|wme|wolterskluwer|woodside|work|works|world|wow|ws|wtc|wtf|xbox|xerox|xfinity|xihuan|xin|xperia|xxx|xyz|yachts|yahoo|yamaxun|yandex|ye|yodobashi|yoga|yokohama|you|youtube|yt|yun|za|zappos|zara|zero|zip|zippo|zm|zone|zuerich|zw))(?:([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-]){0,1})/i

https://regex101.com/r/snRz15/1

Syed Faran Raza · Answer 31 · 2022-08-10T09:53:10.073

0

^(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?

This will verify the url link....

edited Aug 10 '22 at 09:53

answered Aug 03 '22 at 13:20

Syed Faran Raza

19
3

score -1 · Answer 32 · edited Sep 26 '15 at 09:09

-1

This is the best one.

NSString *urlRegex="(http|ftp|https|www|gopher|telnet|file)(://|.)([\\w_-]+(?:(?:\\.[\\w_-]+)‌+))([\\w.,@?^=%&:/~+#-]*[\\w@?^=%&/~+#-])?";

edited Sep 26 '15 at 09:09

Jay Bhalani

4,142
8
37
50

answered Aug 28 '15 at 07:16

Dhinakar

19
1
4

score -1 · Answer 33 · edited Mar 04 '18 at 13:10

-1

This is a simplest one. which work for me fine.

%(http|ftp|https|www)(://|\.)[A-Za-z0-9-_\.]*(\.)[a-z]*%

edited Mar 04 '18 at 13:10

Xantium

11,201
10
62
89

answered Feb 20 '18 at 16:04

Md. Miraj Khan

377
3
15

score -1 · Answer 34 · answered Nov 06 '19 at 01:11

It is just simple.

Use this pattern: \b((ftp|https?)://)?([\w-\.]+\.(com|net|org|gov|mil|int|edu|info|me)|(\d+\.\d+\.\d+\.\d+))(:\d+)?(\/[\w-\/]*(\?\w*(=\w+)*[&\w-=]*)*(#[\w-]+)*)?

It matches any link contains:

Allowed Protocols: http, https and ftp

Allowed Domains: *.com, *.net, *.org, *.gov, *.mil, *.int, *.edu, *.info and *.me OR IP

Allowed Ports: true

Allowed Parameters: true

Allowed Hashes: true

Miku Chan · Answer 35 · 2022-02-13T17:45:12.740

Try this, it can detect any kind of url with any kind of scheme and yes its tested , while copying remember that there should not be blank psace or a new line character

(?:http://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)(?:/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;:@&=])*))*)(?:\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;:@&=])*))?)?)|(?:ftp://(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?&=])*)(?::(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?&=])*))?@)?(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?))(?:/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*))*)(?:;type=[AIDaid])?)?)|(?:news:(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;/?:&=])+@(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3})))|(?:[a-zA-Z](?:[a-zA-Z\d]|[_.+-])*)|\*))|(?:nntp://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)/(?:[a-zA-Z](?:[a-zA-Z\d]|[_.+-])*)(?:/(?:\d+))?)|(?:telnet://(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?&=])*)(?::(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?&=])*))?@)?(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?))/?)|(?:gopher://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)(?:/(?:[a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))*)(?:%09(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;:@&=])*)(?:%09(?:(?:[a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))*))?)?)?)?)|(?:wais://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)/(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)(?:(?:/(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)/(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*))|\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;:@&=])*))?)|(?:mailto:(?:(?:[a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))+))|(?:file://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))|localhost)?/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*))*))|(?:prospero://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*))*)(?:(?:;(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&])*)=(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&])*)))*)|(?:ldap://(?:(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?))?/(?:(?:(?:(?:(?:(?:(?:[a-zA-Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID|oid)\.(?:(?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(?:%20)*))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*))(?:(?:(?:%0[Aa])?(?:%20)*)\+(?:(?:%0[Aa])?(?:%20)*)(?:(?:(?:(?:(?:[a-zA-Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID|oid)\.(?:(?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(?:%20)*))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)))*)(?:(?:(?:(?:%0[Aa])?(?:%20)*)(?:[;,])(?:(?:%0[Aa])?(?:%20)*))(?:(?:(?:(?:(?:(?:[a-zA-Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID|oid)\.(?:(?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(?:%20)*))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*))(?:(?:(?:%0[Aa])?(?:%20)*)\+(?:(?:%0[Aa])?(?:%20)*)(?:(?:(?:(?:(?:[a-zA-Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID|oid)\.(?:(?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(?:%20)*))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)))*))*(?:(?:(?:%0[Aa])?(?:%20)*)(?:[;,])(?:(?:%0[Aa])?(?:%20)*))?)(?:\?(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+)(?:,(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+))*)?)(?:\?(?:base|one|sub)(?:\?(?:((?:[a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))+)))?)?)?)|(?:(?:z39\.50[rs])://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+)(?:\+(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+))*(?:\?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+))?)?(?:;esn=(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+))?(?:;rs=(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+)(?:\+(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+))*)?))|(?:cid:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?:@&=])*))|(?:mid:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?:@&=])*))?)|(?:vemmi://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[/?:@&=])*)(?:(?:;(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[/?:@&])*)=(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[/?:@&])*))*))?)|(?:imap://(?:(?:(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~])+)(?:(?:;[Aa][Uu][Tt][Hh]=(?:\*|(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~])+))))?)|(?:(?:;[Aa][Uu][Tt][Hh]=(?:\*|(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~])+)))(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~])+))?))@)?(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?))/(?:(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~:@/])+)?;[Tt][Yy][Pp][Ee]=(?:[Ll](?:[Ii][Ss][Tt]|[Ss][Uu][Bb])))|(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~:@/])+)(?:\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~:@/])+))?(?:(?:;[Uu][Ii][Dd][Vv][Aa][Ll][Ii][Dd][Ii][Tt][Yy]=(?:[1-9]\d*)))?)|(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~:@/])+)(?:(?:;[Uu][Ii][Dd][Vv][Aa][Ll][Ii][Dd][Ii][Tt][Yy]=(?:[1-9]\d*)))?(?:/;[Uu][Ii][Dd]=(?:[1-9]\d*))(?:(?:/;[Ss][Ee][Cc][Tt][Ii][Oo][Nn]=(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~:@/])+)))?)))?)|(?:nfs:(?:(?://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)(?:(?:/(?:(?:(?:(?:(?:[a-zA-Z\d\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Z\d\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*))*)?)))?)|(?:/(?:(?:(?:(?:(?:[a-zA-Z\d\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Z\d\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*))*)?))|(?:(?:(?:(?:(?:[a-zA-Z\d\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Z\d\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*))*)?)))

Regular expression to find URLs within a string

35 Answers35

Linked

Related