Regular expression for Google Script - Fetched HTML

Question

I have a Google script that gets content from an URL. I'm using regular expressions to find the content I need to grab, for instance:

var htmlSubCategory = UrlFetchApp.fetch(url).getContentText();    
var regexpFindingAllLinks = /<div class="small-12 medium-5 large-4 columns"><a href="\/(.*?)\//g
var linksProducts = regexpFindingAllLinks.exec(htmlSubCategory);

I'm having problems writting an other regular expression for finding the title of some items. The source code looks like this:

<p class="heading"><span class="highlight-ico"></span><a href="/url-1/" title="some title for URL 1">Title I need to grab</a></p>
<p class="heading"><span class="highlight-ico"></span><a href="/url-2/" title="some title for URL 2">Title I need to grab</a></p>

I basically need to have a regex that would look for

<p class="heading"><span class="highlight-ico"></span><a href="(can be any content)" title="(can be any content)">(grab this content)</a></p>

Secondly, I would like to have a regex that would grab only reference numbers, which look like this: X12345678, where X is a letter, followed by 8 digits.

I'm new to these scripts, any help would be appreciated.

oh, my bad, comment removed :p – Jaromanda X Jun 19 '17 at 10:23 — Jaromanda X, Jun 19 '17 at 10:23
Question posts should have only one question. – Rubén Jun 20 '17 at 02:24 — Rubén, Jun 20 '17 at 02:24

score 0 · Accepted Answer · answered Jun 19 '17 at 10:36

You shouldn't use regex to parse HTML, but if you can't do it any other way, use this:

/<p class="heading"><span class="highlight-ico"><\/span><a href="[^"]*" title="[^"]*">((?:(?!<\/a>).)*)<\/a><\/p>/

For your second question (matching reference numbers), use this:

/X\d{8}/

Regular expression for Google Script - Fetched HTML

1 Answers1