Answers so far haven't mentioned a critical fact: it's impossible to write a one-size-fits-all waitUntilPageLoaded
function that works on every page. If it were possble, Puppeteer would surely provide it.
Such a function can't rely on a timeout, because there's always some page that takes longer to load than that timeout. As you extend the timeout to reduce the failure rate, you introduce unnecessary delays when working with fast pages. Timeouts are generally a poor solution, opting out of Puppeteer's event-driven model.
Waiting for idle network requests might not always work if the responses involve long-running DOM updates that take longer than 500ms to trigger a render.
Waiting for the DOM to stop changing might miss slow network requests, long-delayed JS triggers, or ongoing DOM manipulation that might cause the listener never to settle, unless specially handled.
And, of course, there's user interaction: captchas, prompts and cookie/subscription modals that need to be clicked through and dismissed before the page is in a sensible state for a full-page screenshot (for example).
Since every page has different, arbitrary JS behavior, the typical approach is to write event-driven logic that works for a specific page. Making precise, directed assumptions is much better than cobbling together a boatload of hacks that tries to solve every edge case.
If your use case is to write a load event that works on every page, my suggestion is to use some combination of the tools described here that is most balanced to meet your needs (speed vs. accuracy, development time/code complexitiy vs accuracy, etc). Use fail-safes for everything rather than blindly assuming all pages will cooperate with your assumptions. Think hard about what extent you really need to try to handle every web page. Prepare to compromise and accept some degree of failures you can live with.
Here's a quick rundown of the strategies you can mix and match to wait for loads to fit your needs:
page.goto()
and page.waitForNavigation()
default to the load
event, which "is fired when the whole page has loaded, including all dependent resources such as stylesheets and images" (MDN), but this is often too pessimistic; there's no need to wait for a ton of data you don't care about. Often the data is available without waiting for all external resources, so domcontentloaded
should be faster. See my post Avoiding Puppeteer Antipatterns for further discussion.
On the other hand, if there are JS-triggered networks requests after load
, you'll miss that data. Hence networkidle2
and networkidle0
, which wait 500 ms after the number of active network requests are 2 or 0. The motivation for the 2 version is that some sites keep ongoing requests open, which would cause networkidle0
to time out.
If you're waitng for a specific network response that might have a payload (or, for the general case, implementing your own network idle monitor), use page.waitForResponse()
. page.waitForRequest()
, page.waitForNetworkIdle()
and page.on("request", ...)
are also useful here.
If you're waiting for a particular selector to be visible, use page.waitForSelector()
. If you're waiting for a load on a specific page, identify a selector that indicates the state you want to wait for. Generally speaking, for scripts specific to one page, this is the main tool to wait for the state you want, whether you're extracting data or clicking something. Frames and shadow roots thwart this function.
page.waitForFunction()
lets you wait for an arbitrary predicate, for example, checking that the page's HTML or a specific list is a certain length. It's also useful for quickly dipping into frames and shadow roots to wait for predicates that depend on nested state. This function is also handy for detecting DOM mutations.
The most general tool is page.evaluate()
, which plugs code into the browser. You can put just about any conditions you want here; most other Puppeteer functions are convenience wrappers for common cases you could implement by hand with evaluate
.