extracting specific website url from string

Question

H have a regular expression that matches website urls

.+\.\w\w.*(.*)

I would like to extract the url that matches my string for example:

what is google.com?

when i run my code

var x = /.+\.\w\w.*(.*)/
x.exec( "what is <http://google.com>?" )

it instead returns

["what is http://google.com?", ""]

instead of just returning the url that i want it to match, why is this happening?

Use a regexp testing site such as regex101 to test your expressions. — , Jun 18 '16 at 06:27

score 0 · Answer 1 · edited May 23 '17 at 10:29

0

This is because your regex does not really match URLs, but in fact a lot more.

For some inspiration on how to match URLs, you could have a look at the proposal from this StackOverflow answer:

https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)

edited May 23 '17 at 10:29

Community

1
1

answered Jun 18 '16 at 01:51

TimoStaudinger

41,396
16
88
94

ah so thats why, i tested out your regex and it works out perfectly when i try exec command. i have one more question though, i have an existing regex that is using my old similar pattern to match a sentence. /(is\s*(.+\.\w\w.*)\sdown?)/ how do I integrate my old url regex pattern to yours? – deathknight256 Jun 18 '16 at 01:57
I tried /(is\s*([-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*))\sdown[?])/ – deathknight256 Jun 18 '16 at 02:15

score 0 · Accepted Answer · edited May 23 '17 at 12:26

Description

In your expression the . is grabbing any character and the + or * makes the capture greedy. The net effect is that all characters are captured.

([-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6})\b([-a-zA-Z0-9@:%_\+.~#?&\/=]*)

Regular expression visualization

This regular expression will do the following:

Finds strings that resemble urls
ignores any leading http or https
splits the query substring from the URL

Example

Live Demo

https://regex101.com/r/kB1mS6/3

Sample text

what is <http://google.com>?
what is www.ibm.com?
are these the Droids.I.com?Lookingfor=Yes

Sample Matches

Capture group 0 gets the url and query string if it exists
capture group 1 gets the url
Capture group 2 gets the query string if it exists

MATCH 1
1.  [16-26] `google.com`
2.  [26-26] ``

MATCH 2
1.  [37-48] `www.ibm.com`
2.  [48-49] `?`

MATCH 3
1.  [64-76] `Droids.I.com`
2.  [76-91] `?Lookingfor=Yes`

To further capture additional words in the sentence you can modify the expression:

([-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6})\b([-a-zA-Z0-9@:%_\+.~#?&\/=]*)(?:>?\s+(down))?

Regular expression visualization

Examples

Live Demo

https://regex101.com/r/kB1mS6/4

Sample Text

what is <http://google.com> down?
what is www.ibm.com?
are these the Droids.I.com?Lookingfor=Yes
why is http://www.bing.com down?
why is www.bing.com down?