2

A page I'm trying to scrape loads with an initial body of content, followed by a "Load more" button at the bottom of the page. When the "Load more" button is clicked the button is removed from the page, the page loads additional content further down the page (preserving the content from the initial load) and a new "Load more" button is placed at the bottom of the page. The URL of the page does NOT change when the "Load more" button is clicked. I.e. it behaves like a single page application (SPA).

Using Puppeteer I'm able to navigate to the page:

let page = await browser.newPage();
  await page.goto('https://www.someURL.com/home', {
    waitUntil: 'domcontentloaded',
  });

I then use page.$$eval() to find the "Load more" button and click it. The page loads the additional content along with a new "Load more" button at the bottom. However I can't find a way using Puppeteer to 'refresh' my page variable such that I could call page.eval$$() a second time to find the new "Load more" button. Calling page.reload() reverts the page back to the state it was in when I called page.goto().

I've scoured the Puppeteer docs and looked through dozens of examples and can't find a way to do this. Given the prevalence of SPAs I must be overlooking something obvious.

Is this possible?

EDIT: Additional code using @vsemozhetbyt suggestion:

 let theButton = await page.$('button.sc-fzoiQi');
  do {
    await page.evaluate(theButton => {
      theButton.click();
    }, theButton);
  } while ((await buttonExists(page)) !== null);
};
async function buttonExists(page) {
  return await page.$('button.sc-fzoiQi');
}

Using the above, the button is clicked the first time, however the while expression never gets evaluated. I.e. the statement inside it - return await page.$('button.sc-fzoiQi') never returns.

Nick
  • 628
  • 6
  • 21
  • 1
    Try to put this line inside the `do` clause: `let theButton = await page.$('button.sc-fzoiQi');`. The button can be deleted and recreated on each update, so we need a fresh reference to the element on each iteration. – vsemozhebuty Jul 14 '20 at 15:53
  • 1
    That did it. Oddly, it requires me slowing down the `browser` instance (passing `sloMo: 1000` to `puppeteer.launch()` in order for the clicks to work, but I can live with it. Thank you! – Nick Jul 15 '20 at 03:36
  • 1
    You can also try to click in the puppeteer context with the `delay` option, to check if this helps. Maybe it would be a lighter drawback. https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#mouseclickx-y-options – vsemozhebuty Jul 15 '20 at 06:37
  • Does this answer your question? [puppeteer: how to wait until an element is visible?](https://stackoverflow.com/questions/46135853/puppeteer-how-to-wait-until-an-element-is-visible) – ggorlen Nov 29 '20 at 23:13

1 Answers1

1

You can try something like this

do {
  // Get the button, click, wait for the data, get the data.
} while (await page.$(buttonSelector) !== null);
vsemozhebuty
  • 12,992
  • 1
  • 26
  • 26
  • Unfortunately the "Load more" button is one of many button elements on the page and the only way to select it is by selecting all the buttons via `page.$$eval()` and then inspecting each element in the returned array using `forEach` with `element.textContent === "Load more"` – Nick Jul 12 '20 at 20:19
  • 1
    You can extract this logic into an async function and call it in the `while` clause. Or you can use the `while` loop with `break`. – vsemozhebuty Jul 12 '20 at 20:26
  • So, I found a way to use `page.$()` to select the button, regardless of how many times it appears on the page. I've updated my question with additional code that employs your suggestion, however the call to `page.$()` inside the `while` never returns. Ideas? – Nick Jul 14 '20 at 14:49