2

I doesn't seems to be able to capture screenshot from https://today.line.me/HK/pc successfully.

In my Puppeteer script, I have also initiate a scroll to the bottom of the page and up again to ensure images are loaded. But for some reason it does't seems to work on the line URL above.

function wait (ms) {
 return new Promise(resolve => setTimeout(() => resolve(), ms));
}

const puppeteer = require('puppeteer');

async function run() {
let browser = await puppeteer.launch({headless: false});
let page = await browser.newPage();
await page.goto('https://today.line.me/HK/pc', {waitUntil: 'load'});
//https://today.line.me/HK/pc
// Get the height of the rendered page
  const bodyHandle = await page.$('body');
  const { height } = await bodyHandle.boundingBox();
  await bodyHandle.dispose();

  // Scroll one viewport at a time, pausing to let content load
  const viewportHeight = page.viewport().height+200;
  let viewportIncr = 0;
  while (viewportIncr + viewportHeight < height) {
    await page.evaluate(_viewportHeight => {
      window.scrollBy(0, _viewportHeight);
    }, viewportHeight);
    await wait(4000);
    viewportIncr = viewportIncr + viewportHeight;
  }

  // Scroll back to top
  await page.evaluate(_ => {
    window.scrollTo(0, 0);

  });

 // Some extra delay to let images load
 await wait(2000);

await page.setViewport({ width: 1366, height: 768});
await page.screenshot({ path: './image.png', fullPage: true });
}

run();

Puppeteer Line Screenshot Fails

Slay
  • 1,285
  • 4
  • 20
  • 44

3 Answers3

2

For anyone wondering, there are many strategies to render lazy loaded images or assets in Puppeteer but not all of them work equally well. Small implementation details in the website that you're attempting to screenshot could change the final result so if you want to have an implementation that works well across many case scenarios you will need to isolate each generic case and address it individually.

I know this because I run a small Screenshot API service and I had to address many cases separately. This is a big task of this project since there seems to be always something new that needs to be addressed with new libraries and UI techniques being used every day.

That being said I think there are some rendering strategies that have good coverage. Probably the best one is a combination of waiting and scrolling through the page like OP did but also making sure to take into account the order of the operations. Here is a slightly modified version of OP's original code.

//Scroll and Wait Strategy

function waitFor (ms) {
  return new Promise(resolve => setTimeout(() => resolve(), ms));
}

async function capturePage(browser, url) {
  // Load the page that you're trying to screenshot.
  const page = await browser.newPage();
  await page.goto(url, {waitUntil: 'load'}); // Wait until networkidle2 could work better.


  // Set the viewport before scrolling
  await page.setViewport({ width: 1366, height: 768});

  // Get the height of the page after navigating to it.
  // This strategy to calculate height doesn't work always though. 
  const bodyHandle = await page.$('body');
  const { height } = await bodyHandle.boundingBox();
  await bodyHandle.dispose();

  // Scroll viewport by viewport, allow the content to load
  const calculatedVh = page.viewport().height;
  let vhIncrease = 0;
  while (vhIncrease + calculatedVh < height) {
    // Here we pass the calculated viewport height to the context
    // of the page and we scroll by that amount
    await page.evaluate(_calculatedVh => {
      window.scrollBy(0, _calculatedVh);
    }, calculatedVh);
    await waitFor(300);
    vhIncrease = vhIncrease + calculatedVh;
  }

  // Setting the viewport to the full height might reveal extra elements
  await page.setViewport({ width: 1366, height: calculatedVh});

  // Wait for a little bit more
  await waitFor(1000);

  // Scroll back to the top of the page by using evaluate again.
  await page.evaluate(_ => {
    window.scrollTo(0, 0);
  });

  return await page.screenshot({type: 'png'});
}

Some key differences here are:

  • You want to set the viewport from the beginning and operate with that fixed viewport.

  • You can change the wait time and introduce arbitrary waits to experiment. Sometimes this causes elements that are hanging behind network events to reveal.

  • Changing the viewport to the full height of the page can also reveal elements as if you were scrolling. You can test this in a real browser by using a vertical monitor. However make sure to go back to the original viewport height, because the viewport also affects the intended rendering.

One thing to understand here is that waiting alone it's not necessarily going to trigger the loading of lazy assets. Scrolling through the height of the document allows the viewport to reveal those elements that need to be within the viewport to get loaded.

Another caveat is that sometimes you need to wait for a relatively long time for the asset to load so in the example above you might need to experiment with the amount of time you're waiting after each scroll. Also as I mentioned arbitrary waits in the general execution sometimes have an effect on whether an asset load or not.

In general, when using Puppeteer for screenshots, you want to make sure that your logic resembles real user behavior. Your goal is to reproduce rending scenarios as if someone was firing Chrome in their computer and navigating to that website.

whoisjuan
  • 437
  • 1
  • 6
  • 15
1

I have resolved this issue by changing the logic on how I can scroll the page and wait for delay.

Slay
  • 1,285
  • 4
  • 20
  • 44
  • 1
    How did you change the logic specifically? Showing your full code with the exact changes helps future visitors. – ggorlen Feb 12 '21 at 23:54
0

A solution that worked for me:

Adjust the timeout limit for my test runner (mocha).

// package.json
"scripts": {
  "start": "react-scripts start",
  "build": "react-scripts build",
  "eject": "react-scripts eject",
  "test": "mocha --timeout=5000" <--- set timeout to something higher than 2 seconds
},

Wait for x seconds where x ~ half of what you set above, then take srcreenshot.

var path = require("path"); // built in with NodeJS
await new Promise((resolve) => setTimeout(() => resolve(), 2000));
var file_path = path.join(__dirname, "__screenshots__/initial.png");
await page.screenshot({ path: file_path });
lbragile
  • 7,549
  • 3
  • 27
  • 64