0

i am new to Regex usage, and have been searching for some time for suitable regex to retrieve URLs from a paragraph of text.

The current regex I am using:

text.match(/(((ftp|https?):\/\/)(www\.)?|www\.)([\da-z-_\.]+)([a-z\.]{2,7})([\/\w\.-_\?\&]*)*\/?/g);

Returns 'www.mik' as a valid URL from a paragraph of text like '...my webpage is www.mikealbert.com...' and is unsuitable for my purposes.

--

So far, the following regex gives me the best result for matching URLs ('www.mik' is not matched, but 'www.mikealbert.com' is matched)

/(https:[/][/]|http:[/][/]|www.)[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?\/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*$/.test("www.google.com");

However, it can only be used to match single URLs. How should I modify the above regex to return an array of matching URLs? I will also need the regex to handle urls with paths, such as www.facebook.com/abc123?apple=pie&blueberry=cake

Thanks for any help!

garrethp
  • 51
  • 5

1 Answers1

1

Remove dollar sing from end of regex

var regex = /(https:[/][/]|http:[/][/]|www.)[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?\/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])/g; 
var input = "https://stackoverflow.com/ lorem ipsum dolor sit amet http://google.com dolor sit amet www.foo.com"; 
if(regex.test(input)) {
  console.log(input.match(regex));
}

output

[ 'https://stackoverflow.com/',
  'http://google.com',
  'www.foo.com' ]
Krzysztof Safjanowski
  • 7,292
  • 3
  • 35
  • 47