-1

I have a Google script that gets content from an URL. I'm using regular expressions to find the content I need to grab, for instance:

var htmlSubCategory = UrlFetchApp.fetch(url).getContentText();    
var regexpFindingAllLinks = /<div class="small-12 medium-5 large-4 columns"><a href="\/(.*?)\//g
var linksProducts = regexpFindingAllLinks.exec(htmlSubCategory);

I'm having problems writting an other regular expression for finding the title of some items. The source code looks like this:

<p class="heading"><span class="highlight-ico"></span><a href="/url-1/" title="some title for URL 1">Title I need to grab</a></p>
<p class="heading"><span class="highlight-ico"></span><a href="/url-2/" title="some title for URL 2">Title I need to grab</a></p>

I basically need to have a regex that would look for

<p class="heading"><span class="highlight-ico"></span><a href="(can be any content)" title="(can be any content)">(grab this content)</a></p>

Secondly, I would like to have a regex that would grab only reference numbers, which look like this: X12345678, where X is a letter, followed by 8 digits.

I'm new to these scripts, any help would be appreciated.

Rubén
  • 34,714
  • 9
  • 70
  • 166
Tusaro
  • 11
  • 4

1 Answers1

0

You shouldn't use regex to parse HTML, but if you can't do it any other way, use this:

/<p class="heading"><span class="highlight-ico"><\/span><a href="[^"]*" title="[^"]*">((?:(?!<\/a>).)*)<\/a><\/p>/

For your second question (matching reference numbers), use this:

/X\d{8}/
squirl
  • 1,636
  • 1
  • 16
  • 30