2

I am trying to merge multiple, infinite amount, of pdf buffers from puppeteer to a single file. I suspect it has something to do with the buffer, but I have yet to find a solution that seems to work. This got me the closest, How to output a PDF buffer to browser using NodeJS?, but the output still says it's unable to load. Adobe, Chrome, and Fox-IT all say it's corrupt or decoded incorrectly..

Puppeteer code:

async function generateBulkPDFFromUrl(urlString) {
  // launch a new chrome instance
  const browser = await puppeteer.launch({
      headless: true,
      args: ['--font-render-hinting=none']
  });

  // create a new page
  const page = await browser.newPage();
  await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) 
       AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36');

  // set your html as the pages content
  const url = new URL(`${urlString}`);

  await page.goto(url, { waitUntil: 'domcontentloaded' });

  // create a pdf buffer
  const pdfBuffer = await page.pdf({
       format: 'A4'
  });

  // close the browser
  await browser.close();

  return pdfBuffer;
}

Buffer from Puppetteer incase you wanna see it:

<Buffer 25 50 44 46 2d 31 2e 34 0a 25 d3 eb e9 e1 0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 43 72 65 61 74 6f 72 20 28 43 68 72 6f 6d 69 75 6d 29 0a 2f 50 72 6f 64 ... 52960 more bytes>

Array of Buffers sent to Merge Code:

[
  <Buffer 25 50 44 46 2d 31 2e 34 0a 25 d3 eb e9 e1 0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 43 72 65 61 74 6f 72 20 28 43 68 72 6f 6d 69 75 6d 29 0a 2f 50 72 6f 64 ... 52960 more bytes>,
  <Buffer 25 50 44 46 2d 31 2e 34 0a 25 d3 eb e9 e1 0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 43 72 65 61 74 6f 72 20 28 43 68 72 6f 6d 69 75 6d 29 0a 2f 50 72 6f 64 ... 93378 more bytes>
]

Merge Code (PDF-LIB):

async function mergePdfs(pdfsToMerges) {
    const mergedPdf = await PDFDocument.create();
    const actions = pdfsToMerges.map(async pdfBuffer => {

    const pdf = await PDFDocument.load(pdfBuffer);
    const copiedPages = await mergedPdf.copyPages(pdf, pdf.getPageIndices());

        copiedPages.forEach((page) => {
           mergedPdf.addPage(page);
        });
    });

 await Promise.all(actions);

  return await mergedPdf.save();
}

I can get a single pdf to download fine, it's the merge that seems to be the issue. Any insight would be helpful. Thank you.

stm
  • 662
  • 1
  • 6
  • 23
  • Does this answer your question? [Puppeteer Generate PDF from multiple HTML strings](https://stackoverflow.com/questions/48510210/puppeteer-generate-pdf-from-multiple-html-strings) – ggorlen Jun 17 '22 at 19:33
  • @ggorlen no but thank you, I had already been on google for hours. – Jonathan E. Emmett Jun 17 '22 at 23:13
  • OK, just making sure you've tried [`pdf-merger-js`](https://www.npmjs.com/package/pdf-merger-js). – ggorlen Jun 17 '22 at 23:14

1 Answers1

1

So I was correct in saying it was a buffer issue. I'm not sure PDF-LIB can handle the buffers from puppeteer correctly... if anyone has any other solution I'm all ears.. er fingers, well eyes..

Here is what I got to work, found here https://npm.io/package/merge-pdf-buffers

Seems a bit simplistic, and I'm not sure that this person is still supporting it.. so I may pull it down and see what it's doing. Maybe I'll support it. lol

However, at this time this solution is working.