-1

I have a Puppeteer function that plays back the loading of a site by taking screenshots at a set interval:

const getScreenshots = async (browser, url, ms, frames): Promise<string[]> => {
  const page = await browser.newPage()

  // Set screen size
  await page.setViewport({ width: 1280, height: 800 })

  await page.goto(url, {
    waitUntil: "networkidle0",
  })

  const promises = []
  return new Promise((resolve) => {
    const intervalId = setInterval(async () => {
      promises.push(
        page.screenshot({
          captureBeyondViewport: true,
          fullPage: true,
          encoding: "base64",
        })
      )
      if (promises.length >= frames) {
        clearInterval(intervalId)
        resolve(promises)
      }
    }, ms)
  })
}

const screenshotTest = async (url: string): Promise<string> => {
  const browser = await puppeteer.launch({ timeout: 100000 })
  try {
    const imgArray: any[] = await getScreenshots(browser, url, 42, 24)
    return imgArray[imgArray.length - 1]
  } finally {
    // await browser.close()
  }
}

this works well as long as the await browser.close() line remains commented out, but this allows a zombie process to continue running after the requests have completed. When I actually make this call, the function throws because the browser is being closed before the promises have resolved. This is clearly because the function is running async, though I had anticipated that it would wait for the promises to resolve before the finally block runs because the return value of the try block is awaiting them, but clearly this is not the case.

How can I rewrite this to await the promises before the browser is closed?

I believe I could do something like this:

while (true) {
  if (Promise.all(promises)) {
    await browser.close()
  }
}

but it seems really inefficient to just keep looping like that so I'm hoping I'm missing some simple way to await the promises, then close the browser.

anthumchris
  • 8,245
  • 2
  • 28
  • 53
Harry
  • 4,660
  • 7
  • 37
  • 65

3 Answers3

0

Consider changing your getScreenshots() function be similar to:

const getScreenshots = async (browser, url, ms, frames) => {
  const promises = []
  const page = await browser.newPage()

  return page.goto(url).then(async function capture() {
    await new Promise(r => setTimeout(r, ms)) // delay
    promises.push(page.screenshot())
    if (promises.length < frames) {
      return capture()
    }
    const screenshots = await Promise.all(promises)
    page.close() // await should not be needed
    return screenshots
  })
}

Here's a similar end-to-end test:

import puppeteer from 'puppeteer'

const url = 'https://fetch-progress.anthum.com/30kbps/images/sunrise-progressive.jpg'
let browser

try {
  console.log('capturing', url)
  browser = await puppeteer.launch()
  const screenshots = await getAllScreenshots(browser, url, 500)
  console.log('captured', screenshots.length)
} finally {
  browser?.close()
}

// returns Promise<screenshot[]>
async function getAllScreenshots(browser, url, captureInterval) {
  const page = await browser.newPage()

  // Resolves when page load completes and all screenshots complete
  return new Promise(async (resolve, reject) => {
    const all = []
    let timeoutId = 1

    page.on('error', e => {
      timeoutId = clearTimeout(timeoutId)
      reject(e)
    })
    page.on('load', () => {
      timeoutId = clearTimeout(timeoutId)
      resolve(capture(page, all)) // capture final screenshot
    })

    page.goto(url)

    while (timeoutId) {
      await Promise.all([
        capture(page, all),
        new Promise(r => timeoutId = setTimeout(r, captureInterval))
      ])
    }
  }).finally(() => page.close())
}

// returns Promise<screenshot[]> after adding it to @all
async function capture(page, all) {
  all.push(await page.screenshot())
  return all
}

enter image description here

anthumchris
  • 8,245
  • 2
  • 28
  • 53
0

You are not awaiting the promises. You are awaiting a promise to an array which in turn contains all the screenshot promises. Use Promise.all for that:

await Promise.all(arrayOfPromises);

Or in your case:

await (await getScreenshots(browser, url, 42, 24))

And from that you can see that it did not make sense to wrap the whole thing into new Promise to begin with.

trollkotze
  • 1,070
  • 1
  • 11
  • 19
-2

This seems like a case of the new Promise antipattern. Recent Node versions provide a promisified setTimeout and setInterval that lets you avoid callbacks.

For example, with setTimeout:

const {setTimeout} = require("node:timers/promises");

const getScreenshots = async (
  browser,
  url,
  ms,
  frames
): Promise<string[]> => {
  const page = await browser.newPage();
  await page.setViewport({width: 1280, height: 800});
  await page.goto(url, {waitUntil: "networkidle0"});

  const screenshots = [];

  for (let i = 0; i < frames; i++) {
    const screenshot = await page.screenshot({
      captureBeyondViewport: true,
      fullPage: true,
      encoding: "base64",
    });
    screenshots.push(screenshot);
    await setTimeout(ms);
  }

  return screenshots;
};

With setInterval:

const {setInterval} = require("node:timers/promises");

const getScreenshots = async (
  browser,
  url,
  ms,
  frames
): Promise<string[]> => {
  const page = await browser.newPage();
  await page.setViewport({width: 1280, height: 800});
  await page.goto(url, {waitUntil: "networkidle0"});

  const screenshots = [];

  for await (const startTime of setInterval(ms)) {
    const screenshot = await page.screenshot({
      captureBeyondViewport: true,
      fullPage: true,
      encoding: "base64",
    });
    screenshots.push(screenshot);

    if (screenshots.length >= frames) {
      return screenshots;
    }
  }
};

The calling code is the same, with browser.close() uncommented.

Note that any solution with setTimeout or setInterval will drift over time. Taking screenshots is a complex, non-instantaneous subprocess call anyway, so I imagine it'll be diminishing returns to attempt this, but you could try a tight requestAnimationFrame loop with a performance.now() call and drift correction.

In addition to new Promise, other red flags are using async on a function that doesn't have an await in it, using async on a new Promise and making a setInterval callback async. Even if you don't have a newer Node version or don't have utils.promisify (such as in the browser), it's better to bury the promisification in a one-off function and keep your mainline code callback-free:

const sleep = ms => new Promise(resolve => setTimeout(resolve, ms));

Other minor suggestions:

  • It's a bit strange that you create an array of screenshots, then throw them all away except for the last one (you can more idiomatically access the last element of an array with .at(-1)).
  • When you have more than a few arguments, as in getScreenshots(browser, url, 42, 24), the recommended approach is to switch to a configuration object, like getScreenshots(browser, url, {ms: 42, frames: 24}) to keep the code readable.
  • I generally prefer my Puppeteer helper functions to accept a page rather than a whole browser. This allows for maximum reusability because the callee isn't forced to create a new page. The caller can set whichever settings and URL on the page ahead of the screenshot call rather than passing them in as parameters.

Here's a complete, runnable example with the above suggestions applied:

const fs = require("node:fs/promises");
const puppeteer = require("puppeteer");
const {setInterval} = require("timers/promises");

const getScreenshots = async (page, opts = {ms: 1000, frames: 10}) => {
  const screenshots = [];

  for await (const startTime of setInterval(opts.ms)) {
    const screenshot = await page.screenshot({
      captureBeyondViewport: true,
      fullPage: true,
      encoding: "base64",
    });
    screenshots.push(screenshot);

    if (screenshots.length >= opts.frames) {
      return screenshots;
    }
  }
};

// Driver code for testing:
const html = `<!DOCTYPE html>
<html>
<body>
<h1></h1>
<script>
let i = 0;
setInterval(() => {
  document.querySelector("h1").textContent = ++i;
}, 10);
</script>
</body>
</html>
`;

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.setContent(html);
  const screenshots = await getScreenshots(page, {ms: 100, frames: 10});
  console.log(screenshots.length); // => 10
  const gallery = `<!DOCTYPE html><html><body>
  ${screenshots.map(e => `
    <img alt="test screenshot" src="data:image/png;base64,${e}">
  `)}
  </body></html>`;
  await fs.writeFile("test.html", gallery);
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

Open test.html in your browser to see 10 screenshots at different intervals.

Note how removing newPage from the function gives the caller the ability to shoot screenshots on a setContent rather than a goto.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • 1
    That's a lot to read. Keep it simple, please – anthumchris Mar 05 '23 at 22:06
  • 2
    Interesting rationale, but I guess that's your prerogative. The answer is complete, correct, provides a better approach than OP and offers an explanation to defend its design. You're free to ignore the explanation, but it's there for people who care about more than blindly copy-pasting from low-quality, code-only answers. I'll stick to my approach, thanks. – ggorlen Mar 06 '23 at 01:00
  • The original code does not use setTimeout, it uses setInterval, which is a different function with a different purpose. It is intentionally async to ensure each function _starts_ with a predictable interval, regardless of how long the previous call took to complete (and indeed they can and likely will overlap so they must be async). You have a valid point about the throwing away of the screenshots - returning only 1 was testing code that was inadvertently left in, they are all required. WRT browser vs. page, this is a serverless function hence creating a new browser each time. – Harry Mar 06 '23 at 09:04
  • @Harry thanks for the response, but I believe you're misinformed: [`setInterval` is unreliable](https://stackoverflow.com/questions/8173580/setinterval-timing-slowly-drifts-away-from-staying-accurate), so I don't anticipate much difference between the two. Accuracy is another story that hasn't been stated yet in the thread and isn't implied by `setTimeout`. I'd probably use a tight `requestAnimationFrame` loop and check `performance.now()` for starters. – ggorlen Mar 06 '23 at 14:45
  • 1
    @ggorlen this is entirely off-topic for the question, which is about how to close the browser only after the requests have completed. – Harry Mar 06 '23 at 14:46
  • @Harry regarding the browser; it's just an aside, but I think you missed my point. The browser creation in `screenshotTest` is fine. The antipattern is having `browser.newPage` inside `getScreenshots` rather than `screenshotTest`. Not a big deal, just a small tip. – ggorlen Mar 06 '23 at 14:46
  • @Harry OK, well, why did you bring accuracy up, if it's off topic? My code does close the browser after the screenshots have completed, assuming you uncomment your `browser.close()` in the caller. Did you try running it? – ggorlen Mar 06 '23 at 14:47
  • @ggorlen no, `browser.close()` isn't in your code example. I brought up setInterval vs. setTimeout in my comment because you referred to setTimeout a lot in your answer and how a problem with the code is using setTimeout in an async function, despite it not actually being present the code in the question. – Harry Mar 06 '23 at 14:50
  • @Harry the code in the original question is the wrong approach, so I used a function that solves the problem cleanly. Did you try running this? If not, please do. If you did try running this, please explain how it fails to solve your problem and I'll update it. If you want, I can show a correct `setInterval` approach, but it'll be a bit uglier. At this point though, there's been so much dumping on an otherwise correct answer without anyone explaining why it's wrong. Downvotes because I took a slightly different approach and offered suggestions for improvement seem pretty off-base to me. – ggorlen Mar 06 '23 at 14:51
  • Actually, the `setInterval` version turned out pretty clean too, thanks to the async generator version. – ggorlen Mar 06 '23 at 15:10
  • @Harry Re: "how a problem with the code is using setTimeout in an async function", I meant `setInterval`, but same problem either way. – ggorlen Mar 06 '23 at 22:32
  • I've added a complete, runnable example so you can verify the solution. Feel free to let me know if there are any problems using it. – ggorlen Mar 06 '23 at 23:04
  • 1
    This is a good answer. @anthumchris Harry - You are braindead, unappreciative and illiterate subhumans. It is disgusting what a toxic community StackOverflow has become. You did not even notice that you did not await the screenshot promises in the OP. It's no wonder that you don't see how promisified setTimeout is a useful solution for that. Instead of understanding and correcting your mistakes you seem to be looking for something entirely else. – trollkotze May 05 '23 at 11:34