1

I'm trying to generate a pdf file of a web page and want to save to local disk to email later.

I had tried this approach but the problem here is, its not working for pages like this. I'm able to generate the pdf, but its not matching with web page content.

Its very clear that pdf is generated before document.ready or might be something else. I'm unable to figure out the exact issue. I'm just looking for an approach where I can save web page output as pdf.

I hope generating pdf of a web page is more suitable in Node then PHP? If any solution in PHP is available then it will be a big help or even node implementation is also fine.

TylerH
  • 20,799
  • 66
  • 75
  • 101
Mahesh.D
  • 1,691
  • 2
  • 23
  • 49
  • Can you share the code? – Vaviloff May 08 '18 at 15:18
  • 1
    You may need to put a setTimeout to make sure the whole page, including javascript-generated parts of it is ready before rendering it... – quarks May 09 '18 at 01:38
  • @xybrek I had tried with setTimeout before posting this question but its also no use – Mahesh.D May 09 '18 at 04:52
  • @Vaviloff I had tried the same code snippet mentioned in 'c' [here](https://stackoverflow.com/a/16124992/1111502), except I had changed [url](http://www.chartjs.org/samples/latest/charts/pie.html) – Mahesh.D May 09 '18 at 04:54
  • 1
    You get to choose the point where the screenshot is generated. Just make sure the document is actually ready when you do that – apokryfos May 09 '18 at 08:15

3 Answers3

3

Its very clear that pdf is generated before document ready

Very true, so it is necessary to wait until after scripts are loaded and executed.


You linked to an answer that uses phantom node module.

The module was upgraded since then and now supports async/await functions that make script much much more readable.

If I may suggest a solution that uses the async/await version (version 4.x, requires node 8+).

const phantom = require('phantom');

const timeout = ms => new Promise(resolve => setTimeout(resolve, ms));

(async function() {
  const instance = await phantom.create();
  const page = await instance.createPage();

  await page.property('viewportSize', { width: 1920, height: 1024 });

  const status = await page.open('http://www.chartjs.org/samples/latest/charts/pie.html');

  // If a page has no set background color, it will have gray bg in PhantomJS
  // so we'll set white background ourselves
  await page.evaluate(function(){
      document.querySelector('body').style.background = '#fff';
  });

  // Let's benchmark
  console.time('wait');

  // Wait until the script creates the canvas with the charts
  while (0 == await page.evaluate(function(){ return document.querySelectorAll("canvas").length }) )  {
      await timeout(250);
  }

  // Make sure animation of the chart has played
  await timeout(500);

  console.timeEnd('wait');

  await page.render('screen.pdf');

  await instance.exit();
})();

On my dev machine it takes 600ms to wait for the chart to be ready. Much better than to await timeout(3000) or any other arbitrary number of seconds.

Vaviloff
  • 16,282
  • 6
  • 48
  • 56
  • but I think its not ideal solution because we don't know exact time of page load! Is there an event for this? – Mahesh.D May 09 '18 at 09:53
  • Yes there is, [onLoadFinished](http://phantomjs.org/api/webpage/handler/on-load-finished.html). For the sake of simplicity of example I didn't use it, but to save time you definitely should move main login into this callback. – Vaviloff May 09 '18 at 13:40
  • I had tried `onLoadFinished` but page is rendering before its get loaded, [here](https://jsfiddle.net/o4cg7vob/) is the sample code – Mahesh.D May 10 '18 at 06:13
  • 1
    My bad, `onLoadFinished` is irrelevant here — we must wait for the scripts to finish their work. So I changed the answer to wait for only necessary amount of time, no more. – Vaviloff May 10 '18 at 06:51
0

I did something similar using the html-pdf package.

The code is simple, you can use it like this:

pdf.create(html, options).toFile('./YourPDFName.pdf', function(err, res) {
        if (err) {
          console.log(err);
        }
});

See more about it on the package page here.

Hope it helps you.

Fernando Paz
  • 532
  • 6
  • 19
0

When saving HTML to PDF if the page is scripted over a time period we simply need to add a suitable delay so here the results are at 1/2 second (500 ms) and 1 second (1000 ms), you can simply increase more if page is more complex or your communications / PC is slower.

Using Chrome or Edge call the browser with more time allowance.

"C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe" --headless --print-to-pdf=C:/data/output2.pdf --print-to-pdf-no-header --virtual-time-budget=1000 "https://www.chartjs.org/docs/latest/samples/other-charts/pie.html" && timeout 3 && C:/data/output2.pdf

enter image description here

K J
  • 8,045
  • 3
  • 14
  • 36