4

In my application, a user can upload a PDF which other users can later view. For my usecase, I need to ensure that the PDFs are not locked or encrypted and can be viewed by any other user.

To do this, I am asking users to upload unlocked PDFs and would like to throw an error if the PDF is locked, before I try to upload to S3.

I haven't found a consensus on what might be the best way to do this, in-browser? Do I try to read the buffer and throw an error if I am unable to? Or is there another performant and efficient way of detecting this?

geoboy
  • 1,172
  • 1
  • 11
  • 25
  • 1
    Detecting encryption should be as little as searching for the string `/Encrypt` in the file. – Thomas Jun 24 '17 at 00:47
  • So, in the file buffer (or after reading file as text), just search for '/Encrypt'? Anything else to watch out for? – geoboy Jun 24 '17 at 04:09
  • 1
    I think that "Anything else to watch out for?" question is eventually going to require you to understand the format better--or at least the parts of it that can contain encryption/protection flags. Consider a PDF document that includes in its visible content the text "/Encrypt". You might get a false match on it unless you were parsing with the full context of the file format, e.g. "/Encrypt" must appear at beginning of file, inside of a certain section. https://en.wikipedia.org/wiki/Portable_Document_Format – Erik Hermansen Jun 24 '17 at 21:36

3 Answers3

3

You can try using the below solution:

const reader = new FileReader();
reader.readAsArrayBuffer(file);
reader.onload = function () {

var files = new Blob([reader.result], {type: 'application/pdf'});
files.text().then(x=> {
    console.log("isEncrypted", x.includes("Encrypt")) // true, if Encrypted
    console.log("isEncrypted", x.substring(x.lastIndexOf("<<"), x.lastIndexOf(">>")).includes("/Encrypt"));
    console.log(file.name);
});
2

It's better for the user experience, bandwidth, and performance to detect the status on the client side. You can have a file input element on your page, and trap the onChange event.

<input type="file" id="pdfFile" size="50" onChange='processFile' />

Inside of the onChange-handling function, you can get at the file bytes and load into a buffer. For code and more details, see reading file contents on the client side in javascript in various browsers.

You'll need to do some PDF parsing to learn the locked/encrypted status, but I imagine there are JS libraries that do it. Even if you have very large PDFs to parse, it will always be faster than uploading the PDF to the server, since that upload time will be a function of file size.

Cases I could see for uploading the file instead of client-side parsing:

  • you are targeting lower-end mobile devices and expect PDFs that are +100mb.
  • you will be running on browsers with Javascript restrictions
  • you always want to upload the file to your server even if the PDF is protected, and you've worked out that the user experience is better
Erik Hermansen
  • 2,200
  • 3
  • 21
  • 41
  • this is more along the lines I was going, but am stuck at how to learn about the locked/encrypted status when I have the read the File with `FileReader` object. any thoughts on there? – geoboy Jun 24 '17 at 04:08
  • I'm not an expert on PDF format, so I won't say too much here. I'd start with a google search for "js pdf parser" and start looking through the libs. Alternatively, you could get a sample set of PDFs, learn the format a little bit, and write a quick and dirty parser yourself. The second way would be more performant, because you can specialize the parsing to just finding the text you need instead of processing everything. – Erik Hermansen Jun 24 '17 at 21:31
1

What you can do is use pdfjs to open the pdf file and try to get the number of pages. When the file is password protected you get a PasswordException.

Have a look to this fiddle: https://jsfiddle.net/fe6jLgr5/15/

document.getElementById("pdfFile").addEventListener("change",
   function(event) {
      let file = event.target.files[0];
      let reader = new FileReader();
      reader.readAsArrayBuffer(file);
      reader.onload = function(e) {
         var docInitParams = {
         data: e.target.result,
         password: ''
      };
      pdfjsLib.getDocument(docInitParams).promise.then((pdfDocument) =>
      {
         // get all the pages from pdf, works if not password protected.
         const numPages = pdfDocument.numPages;
         console.log('Doc not password protected');
      }).catch(err => console.log(err))
   }
},false);
lomaky
  • 11
  • 1