I have a website that I need to scrape. The content is below (and the page numbers are variable):
<a class="page-numbers" href="https://example.com/page/6/">6</a>
<a class="page-numbers" href="https://example.com/page/7/">7</a>
<a class="page-numbers" href="https://example.com/page/8/">8</a>
<a class="next page-numbers" href="https://example.com/page/49/">NEXT</a>
I need to get the last page number, which in the above example is 8.
I'm using AppScript with Google sheets and I've tried various solutions, including grouping so it displays the full page numbers. My final output (based on the above example) should appear as: Total pages: 8
Could any of you REGEX wizards help?
Additional notes:
- Using pure JS isn't an option
- There can be any amount of pages, what I'm looking for isn't always the third occurrence
The below is returning nothing.
function regex_validshouldwork() {
const url = 'https://example.com',
response = UrlFetchApp.fetch(url);
let content ;
let html = response.getContentText();
const myRegex= new RegExp("(?:\<a class=\"page-numbers[^>]*>(\d+)<\/a>\s*)+");
content = html.match(myRegex);
SpreadsheetApp.getActiveSheet().getRange('a2').setValue(content);
}