9

Currently I have a site that has this in its HTML. I confirmed it from checking the elements in chrome developer tools.

<div class="hdp-photo-carousel" style="transform: translateX(0px);">
  <div class="photo-tile photo-tile-large">

I visually watch the page open up and I can see the item is there. Then I get this error after 30 seconds:

UnhandledPromiseRejectionWarning: TimeoutError: waiting for selector ".photo-tile" failed: timeout 30000ms exceeded

My code in puppeteer js for this is:

const pptrFirefox = require('puppeteer-firefox');

(async () => {
  const browser = await pptrFirefox.launch({headless: false});
  const page = await browser.newPage();
  await page.goto('https://zillow.com');
  await page.type('.react-autosuggest__input', '8002 Blandwood Rd. Downey, CA 90240');
  await page.click('.zsg-search-button_primary');
  await page.waitForSelector('.photo-tile');
  console.log('did I get this far?');
})();

Can anyone tell me what I'm doing wrong?

danronmoon
  • 3,814
  • 5
  • 34
  • 56
FabricioG
  • 3,107
  • 6
  • 35
  • 74
  • 2
    Did you check the selector in exactly the same browser that `puppeteer-firefox` run? For me, when the error is shown and I run `document.querySelector(".photo-tile")` in the firefox console, I get `null`. However, I can see an image block, for which the same selector in Chrome returns the element. Could it be that the page has different DOM for different browsers or browser versions? – vsemozhebuty Feb 14 '19 at 00:31
  • The exact selector is appears like so: photo-tile photo-tile-large but I'm only asking for photo-tile. Can this be the problem? @vsemozhetbyt – FabricioG Feb 14 '19 at 00:57
  • I do not think so: both selectors should give the element if it presents. – vsemozhebuty Feb 14 '19 at 01:03
  • 1
    Let me check the selector as you are saying @vsemozhetbyt – FabricioG Feb 14 '19 at 01:06
  • 1
    @FabricioG Changed headless : true and debugging on it with "Inspect" – Chuong Tran Feb 14 '19 at 05:44
  • 1
    I checked it and it does appear for me on the firefox browser. @vsemozhetbyt – FabricioG Feb 14 '19 at 18:15

2 Answers2

6

You need to add page.waitForNavigation() every time page content updates.

(async () => {
  const browser = await pptrFirefox.launch({headless: false});
  const page = await browser.newPage();
  const navigationPromise = page.waitForNavigation({waitUntil: "domcontentloaded"});
  await page.goto('https://zillow.com');
  await navigationPromise;
  await page.type('.react-autosuggest__input', '8002 Blandwood Rd. Downey, CA 0240');
  await page.click('.zsg-search-button_primary');
  await navigationPromise;
  await page.waitForSelector('.photo-tile');

  console.log('did I get this far?');

})();
Let Me Tink About It
  • 15,156
  • 21
  • 98
  • 207
Ram
  • 169
  • 1
  • 2
  • 8
    There is no use for multiple `await navigationPromise;`, if it is resolved once, the next time it is already resolved and will return immediately. You probably meant 2x `await page.waitForNavigation({waitUntil: "domcontentloaded"})` – barney765 Mar 18 '20 at 13:52
  • `goto` already waits for navigation, so there's no need to wait for it twice. Even if you do, the second promise resolves instantly, as pointed out, so that line is a red herring. – ggorlen Feb 06 '23 at 23:13
2

The site has changed in the 4 years since this has been asked, but it's a common story: an element is hand-verified to exist in dev tools and the selector is copied to Puppeteer but there's a timeout when waiting for it.

There are at least a few common reasons for this:

  • The element is in a shadow root
  • The element is in an iframe
  • The element needs to be scrolled into view, or is otherwise out of the viewport
  • The server is detecting your script as a bot and blocking you, or presenting a captcha

One debugging strategy is to run headfully (OP is already doing this, but future visitors may not be). If the code works, then the site is only detecting you as a bot when you're headless. See the canonical Why does headless need to be false for Puppeteer to work? for next steps. console.log(await page.content()) can help establish whether you're being blocked headlessly.

If running headfully still doesn't work, look at the page to see why. In some cases, the page may show a captcha, leading to Bypassing CAPTCHAs with Headless Chrome using puppeteer. This appears to be the case in the current question at the time of writing.

Typically, adding more waitForNavigations and setting timeouts to 0 doesn't help (unless you're navigating between pages with a click or form submission, then waitForNavigation may be appropriate).

ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • What's the solution if "The element needs to be scrolled into view, or is otherwise out of the viewport"? – Danny Cooper Feb 11 '23 at 20:50
  • It depends on the site. There's no silver bullet solution--scraping is about understanding and manipulating site-by-site behavior and adapting your conceptual tools to fit specific issues. – ggorlen Feb 11 '23 at 21:09