I have a regex
expression that returns me all the links from a html file, but it has a problem: instead of returning just the link, like http://link.com
, it also returns the href=" (href="http://link.com
). What can I do to only get the links
without having that href="
?
This is my regex:
/href="(http|https|ftp|ftps)\:\/\/[-a-zA-Z0-9.]+\.[a-zA-Z]{2,3}(?:\/(?:[^"<=]|=)*)?/g
Full code:
var source = (body || '').toString();
var urlArray = [];
var url;
var matchArray;
// Regular expression to find FTP, HTTP(S) URLs.
var regexToken = /href="(http|https|ftp|ftps)\:\/\/[-a-zA-Z0-9.]+\.[a-zA-Z]{2,3}(?:\/(?:[^"<=]|=)*)?/g;
// Iterate through any URLs in the text.
while( (matchArray = regexToken.exec( source )) !== null )
{
var token = matchArray[0];
token = JSON.stringify(matchArray[0]);
token = matchArray[0].toString();
urlArray.push([ token ]);
}