0

I want to make sure that the file i am using is really a pdf and not another file with a pdf extension.

in my code, the file lives inside a file object called file. The parameter file.type show as a "application/pdf" regardless of what file it really is, if it has a .pdf extension, it will show this file type. so this is not a good enough check.

If i read the contents of the file using a filereader:

 let reader = new FileReader();
  reader.readAsBinaryString(file); 
    reader.onloadend = function() {
      console.log(reader.result.includes("%PDF"))
  }

then i can see on the console a true of false if it is really a pdf file. What am stuck with using this method is i can not get any usable data out of the function.

So is there away i can use the results of reader.onloadend = function() {} in the rest of my code or is there a better why to validate the file.

Many Thanks

EDIT: Thanks everyone on the validation of a PDF. I am pretty happy with this issue now. The problem i have is to get the contents of the file or the results of the validation in to a variable i can use. I am having problems with promises etc.

So i know have:

function readFile(file){
  return new Promise((resolve, reject) => {
    const fr = new FileReader();  
    fr.onload = () => {
      if (fr.result.includes("PDF")){  // i will use the new pdf validation here! 
        resolve("OK");
      } else {
        reject("Error");
      }
    };
    fr.onerror = reject;
    fr.readAsBinaryString(file);
  });
}

Then i call this with:

readFile(file).then(
    function(value) {console.log(value);},
    function(error) {console.log(error);}
  );

but i still can not then get this data in to a variable i can use.

for example i would like something like this:

var state;
  readFile(file).then(
    function(value) {state = value;},
    function(error) {state = error;}
  );
if (state == "OK"){...

but 'state' is just showing as undefined.

  • your function should return a `Promise` which resolves to true or false inside the `onloadend` handler. Other functions should `await` your function – gog Sep 29 '22 at 14:08
  • 1
    @KJ, Thanks, i have looked into this and the if statement now looks like this: `if (reader.result.slice(0, 5).includes("%PDF-") && reader.result.slice(-6).includes("%EOF") && reader.result.includes("xref"))` but i still need to work out how to get the result out of the function? – user3714154 Sep 29 '22 at 16:01

1 Answers1

0

For further verification you could test the first 8 characters against some regex. I found some good information about validating a PDF file in this article from 2013. There is a further link to making your own PDF file if you want to learn what makes a PDF file valid.

const reader = new FileReader();
reader.addEventListener('load', (event) => {
  const fileTypeHeader = reader.result.substring(0, 8);
  const regexTest = /%PDF-1\.[0-7]/;
  console.log(fileTypeHeader);
  // testing for text in file header
  if (regexTest.test(fileTypeHeader)) {
    alert('this is a valid pdf');
  } else {
    alert('there is something off about this pdf');
  }

});

Then in order to preserve your results you can return the results in a promise, I found another SO answer that had this simple snippet.

function readFile(file){
  return new Promise((resolve, reject) => {
    var fr = new FileReader();  
    fr.onload = () => {
      resolve(fr.result )
    };
    fr.onerror = reject;
    fr.readAsText(file.blob);
  });
}
Jordan
  • 38
  • 1
  • 7
  • Thanks, I have tried this and the results are returned as `Promise {}[[Prototype]]: Promise[[PromiseState]]: "fulfilled"[[PromiseResult]]: "%PDF-1.7\r%âãÏÓ\r\n405 0 obj\r< – user3714154 Sep 29 '22 at 16:45
  • @user3714154 If you are getting a promise back from the readFile I think you should be able to call `.then()`. Similar to how it is set up in the docs https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/resolve – Jordan Sep 29 '22 at 18:10
  • @KJ Good point, that's good to know. I saw that the article that I referenced was from 2013 [9 years] so I figured some of those versions were going to be a little out of date. Do you have an opinion on when and how a PDF file should absolutely be validated? – Jordan Sep 29 '22 at 18:24
  • @Jordan thanks, But following that doc only lets me use the results in the then function. I need the results outside of any functions and in a variable, any suggestions? – user3714154 Sep 29 '22 at 18:31
  • @user3714154 Yup, if you prepend the function with `async` and use `await` on the variable assignment you should get some values that you want to use. `var results = await readFile();`. This should wait for the promise to resolve before moving onto the next line. This can lead to other issues because it's pretty basic, if you aren't familiar with async/await or promises I would block out some time on a saturday morning and play around with them. – Jordan Sep 29 '22 at 19:13