2

I am struggling to apply text redaction in a PDF file in a aws lambda function written in NodeJs. Here is a list of libraries that I have tried with no success:

pdf-lib: This library almost fulfils all the requirements except that it doesn't redact the text permanently as part of its limitations https://github.com/Hopding/pdf-lib/issues/827

PDF.js: To overcome the above limitation, tried to covert the pdf to an image, so the redaction black boxes are applied permanently. Example code here: https://github.com/mozilla/pdf.js/blob/master/examples/node/pdf2png/pdf2png.js However, this lib is not reliable as this cannot extract contents from most pdfs during the process.

Finally, Pdf2Pic: This library helps to overcome the limitation of the first library (pdf-lib) by the converting the pdf into images. But this library internally uses two non node based libraries (graphicsmagick and ghostscript) which I am trying to avoid.

Is there a nodejs based solution that can be used to apply redaction permanently on a pdf file or any solution that can be used to covert a pdf to images to overcome limitation of pdf-lib.

Mahedi Hassan
  • 93
  • 1
  • 10

0 Answers0