I'm using Google Apps Script to fetch the content of emails from gmail and after that I need to extract all of the links from the html tags. I found some code here, on stackoverflow, and I implemented it with a regular expression, but the issue is that it is always returning me the first url. (http://vacante2016.eu/tr/17599/51743713/c4f5eadf38eb475d39e3cdeca9201538
)
Is there a way to make a loop that search for the next content that matches the regex expression to display all of the elements one by one?
Here you can see an example with the content of an email that I need to get those links from: https://www.mailinator.com/inbox2.jsp?public_to=get_urls#/#public_showmaildiv
This is my code:
function getURL() {
var threads = GmailApp.getInboxThreads();
var message = threads[0].getMessages()[0];
var content = message.getRawContent();
var source = (content || '').toString();
var urlArray = [];
var url;
var matchArray;
// Regular expression to find FTP, HTTP(S) URLs.
var regexToken = /(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/;
// Iterate through any URLs in the text.
while( (matchArray = regexToken.exec( source )) !== null )
{
var token = matchArray[0];
urlArray.push( token );
}
}
UPDATE:
Changed the regex to /(?:ht|f)tps?\:\/\/[a-zA-Z0-9\-.]+\.[a-zA-Z]{2,3}(\/[\S=]*)?/g
improved the things but now I also get the following type of response when I search for urls: "http://vacante2016.eu/clk/17599/5=\r\n1743713/150132/bf7639dd7e7aa48c9197a52a8c61e168\"><img"
... I think that the regex should also have a condition to return the url
but only up to the >
symbol.
Also, is there a way to remove the additional characters like =
, \r
and \n
from the found url?