I wanted to convert HTML to PDF. HTML might contain some javascript and CSS (with some external fonts)
As our application is in java I am looking for Java API. I found iTextPdf7 which is able to render HTML+CSS but not Javascript. As explained by Bruno Lowaige here, iText cannot execute Javascript. To render it, we have to use any browser engine implementation like Webkit or Gecko.
We started using wkhtmltopdf which is based on QT Webkit. Now, it has its own issues like "breaks the tables and contents when could not fit in single page which makes pdf awkward".
We wanted "some Java API that runs HTML+CSS+Javascript and gives us the string or stream which we will feed to iText7 to convert it into pdf" or "an API to convert HTML+CSS+Javascript to pdf". I could not find any standard Java API for webkit or gecko.
Please let me know if there is any Java implementation available for webkit or gecko.
We explored Puppeteer. We are able to render the pdf but not satisfactorily. For example, take a look at the Medium or NyTimes Chinese pages which has lots of dynamic content/fonts. Here is our code.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.medium.com',{ waitUntil: 'networkidle0' });
//await page.waitFor(10000);
await page.pdf({
path: 'output.pdf',
format: 'Letter',
margin: {
top: '10mm',
right: '10mm',
bottom: '10mm',
left: '10mm',
},
displayHeaderFooter: false,
printBackground: true,
scale:1.0,
printBackground:true
});
await browser.close();
})();
When we generate pdf with puppeteer for Medium, it is not rendering complete page and Adobe Acrobat Reader DC (macOS) is complaining "Cannot extract the embedded font 'T3Font_". When rendering NyTimes Chinese, we get the same font error. We tried different wait options with no luck. But, if I render and open pages in CentOS default pdf reader, I am not seeing any error, which is >probably beacuse I have xorg-x11-fonts-Type1, xorg-x11-fonts-75dpi fonts installed in CentOs server
How do we let the puppeteer wait till entire page is rendered and do not let it miss the fonts?
A Java API that does the rendering of PDF is also a best option.
I am currently using macOS. When I opne the generated pdf files in Adobe Acrobat Reader DC,