For anyone wondering, there are many strategies to render lazy loaded images or assets in Puppeteer but not all of them work equally well. Small implementation details in the website that you're attempting to screenshot could change the final result so if you want to have an implementation that works well across many case scenarios you will need to isolate each generic case and address it individually.
I know this because I run a small Screenshot API service and I had to address many cases separately. This is a big task of this project since there seems to be always something new that needs to be addressed with new libraries and UI techniques being used every day.
That being said I think there are some rendering strategies that have good coverage. Probably the best one is a combination of waiting and scrolling through the page like OP did but also making sure to take into account the order of the operations. Here is a slightly modified version of OP's original code.
//Scroll and Wait Strategy
function waitFor (ms) {
return new Promise(resolve => setTimeout(() => resolve(), ms));
}
async function capturePage(browser, url) {
// Load the page that you're trying to screenshot.
const page = await browser.newPage();
await page.goto(url, {waitUntil: 'load'}); // Wait until networkidle2 could work better.
// Set the viewport before scrolling
await page.setViewport({ width: 1366, height: 768});
// Get the height of the page after navigating to it.
// This strategy to calculate height doesn't work always though.
const bodyHandle = await page.$('body');
const { height } = await bodyHandle.boundingBox();
await bodyHandle.dispose();
// Scroll viewport by viewport, allow the content to load
const calculatedVh = page.viewport().height;
let vhIncrease = 0;
while (vhIncrease + calculatedVh < height) {
// Here we pass the calculated viewport height to the context
// of the page and we scroll by that amount
await page.evaluate(_calculatedVh => {
window.scrollBy(0, _calculatedVh);
}, calculatedVh);
await waitFor(300);
vhIncrease = vhIncrease + calculatedVh;
}
// Setting the viewport to the full height might reveal extra elements
await page.setViewport({ width: 1366, height: calculatedVh});
// Wait for a little bit more
await waitFor(1000);
// Scroll back to the top of the page by using evaluate again.
await page.evaluate(_ => {
window.scrollTo(0, 0);
});
return await page.screenshot({type: 'png'});
}
Some key differences here are:
You want to set the viewport from the beginning and operate with that fixed viewport.
You can change the wait time and introduce arbitrary waits to experiment. Sometimes this causes elements that are hanging behind network events to reveal.
Changing the viewport to the full height of the page can also reveal elements as if you were scrolling. You can test this in a real browser by using a vertical monitor. However make sure to go back to the original viewport height, because the viewport also affects the intended rendering.
One thing to understand here is that waiting alone it's not necessarily going to trigger the loading of lazy assets. Scrolling through the height of the document allows the viewport to reveal those elements that need to be within the viewport to get loaded.
Another caveat is that sometimes you need to wait for a relatively long time for the asset to load so in the example above you might need to experiment with the amount of time you're waiting after each scroll. Also as I mentioned arbitrary waits in the general execution sometimes have an effect on whether an asset load or not.
In general, when using Puppeteer for screenshots, you want to make sure that your logic resembles real user behavior. Your goal is to reproduce rending scenarios as if someone was firing Chrome in their computer and navigating to that website.