1

I'm trying to click the 'Show more' button, on a google search for a recipe, multiple times using puppeteer. I have it working in a for loop like this,

for (let i = 0; i <= numberOfClicks; i++) {
    await page.click('div[aria-label="Show more"]')
    await page.waitForTimeout(800)
  }

However this adds 800ms for every click, which I would really like to avoid to save time on the call I'm making.

I've tried various different ways of doing this and feel something like this,

for (let i = 0; i <= numberOfClicks; i++) {
    await page.waitForFunction(
      `document.querySelectorAll("g-inner-card a").length > ${i * 9}`
    )
    await page.click('div[aria-label="Show more"]')
  }

should do what I need, as google starts with 3 recipes and opens 9 more every time you click 'Show more', however, so far I can only get it to click once and then hangs with this method.

Any help would be really appreciated, thanks.

dbacall
  • 11
  • 4
  • Can you share the URL in question, please, so I can run and test the code myself? `"g-inner-card a"` looks like a bad selector, though. If `g-inner-card` is a class, it should be `".g-inner-card a"`. Whenever you're running code in the browser, it's best to test it by hand in the console or add [listeners](https://stackoverflow.com/questions/58089425/how-do-print-the-console-output-of-the-page-in-puppeter-as-it-would-appear-in-th) so you can see the errors logged. – ggorlen Jul 24 '21 at 19:24
  • So, as an example, https://www.google.com/search?q=spaghetti+bolognese, and you can see if you inspect they somehow have `g-inner-card` elements. And, I'm using `"g-inner-card a"` successfully as a selector in other places in my code for this, and have also checked it in the console. I think it's something to with the moving button when 'Show more' is clicked, and things needing to be slowed down, hence why it works with the `waitForTimeout`. I just don't get why `waitForFunction` isn't working as I think it should – dbacall Jul 24 '21 at 20:19
  • Thanks for the URL -- yeah, you're right, that is a legitimate element. Your code works for me. What data are you ultimately trying to get here? All I can think is maybe you're making too many requests and they're throttling you, or there's an A/B situation, or something. – ggorlen Jul 24 '21 at 20:37
  • I'm trying to get an array of the recipe URLs. Which I have, but I really don't wanna do it using waitForTimeout. How many clicks were you able to get to work with your code? Anything more than one click hangs for me. Could you copy and paste the code you wrote? Maybe there's something in my code a bit earlier that's causing it to go slow or something. Could you also elaborate a bit more on what you mean by an A/B situation? – dbacall Jul 24 '21 at 21:01
  • I also have a problem that when it does click more than once, sometimes it clicks one of the actual recipes for some reason. I'm struggling to debug cos hard to see what puppeteer has actually done when it goes wrong – dbacall Jul 24 '21 at 21:11
  • It also works with the second code block if I add `slowMo: 500` in the puppeteer.launch object, but again that's a terrible way to do it – dbacall Jul 24 '21 at 21:32

1 Answers1

0

Google's pages tend to be annoying to scrape. Elements have unreliable auto-generated classes and ids and Google seems to have a fondness for unnecessary animations.

Here's what worked best for me after a bit of messing around:

const puppeteer = require("puppeteer");

let browser;
(async () => {
  browser = await puppeteer.launch({headless: false});
  const [page] = await browser.pages();
  const url ="https://www.google.com/search?q=spaghetti+bolognese";
  await page.goto(url, {waitUntil: "networkidle0"});
  const showMoreSel = 'div[aria-label="Show more"]';

  while (await page.$eval(showMoreSel, el => el.offsetParent !== null)) {
    await page.evaluate(`document.querySelector('${showMoreSel}').click()`);
  }

  const urls = await page.$$eval("g-link a",
    els => els.map(e => e.getAttribute("href"))
  );
  console.log(urls);
})()
  .catch(err => console.error(err))
  .finally(async () => await browser.close())
;

This uses two tricks to bypass shortcomings with Puppeteer's builtin DOM manipulation abstractions:

  • page.$('div[aria-label="Show more"]', {visible: true}) seems to always think the 'Show more' button is visible even when it's not (indicating the end of the cards). Using the native approach in an eval seems accurate.
  • page.click() fails as well. This method works by hovering a selector with the mouse, then triggering a mousedown event, which can cause the wrong thing to be clicked on when trying to target something in a tight loop with animations moving things around. Once again, evaluate to the rescue using a native browser click.

Assuming g-link a maps 1:1 with the recipe cards is probably brittle, so you might want to come up with a more robust way of finding those elements.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • 1
    All I needed to do was change my click to the native browser click and it works now! I thought it was to do with clicking the wrong place, but then I thought cos it's using a selector that it was doing it based off the element not the mouse movement, really good to know. Thanks for the help! – dbacall Jul 24 '21 at 22:13