1

The question is: How can I get the name of the pdf file using pdf.js? I'm running a variation of a pdf.js example from node, and I was wondering if it's at all possible to get it. I've been searching through pdf.js's docs/source, but couldn't find anything obvious. I'm using this code, which (so far) shows the number of pages of each file found on a given folder (in this case, the directory this code is being run from):

var fs = require('fs');
var glob = require('glob');

global.window = global;
global.navigator = { userAgent: "node" };
global.PDFJS = {};
global.DOMParser = require('./domparsermock.js').DOMParserMock;

require('../../build/singlefile/build/pdf.combined.js');
glob("**/*.pdf", function (er, files) {
for(var i = 0; i < files.length; i++){
var data = new Uint8Array(fs.readFileSync(files[i]));
PDFJS.getDocument(data).then(function (doc) {
      var numPages = doc.numPages;
      console.log('Number of Pages: ' + numPages);
      console.log();
    }).then(function () {
      console.log('# End of Document');
    }, function (err) {
      console.error('Error: ' + err);
    });
   }
});

I thought the name of the file was in the doc object as an attribute or something like that, but that doesn't seem to be the case here, and I couldn't find anything about this in the docs. Is there something I'm missing or doing wrong here?

Jose Ramirez
  • 381
  • 6
  • 20
  • You can grab the file name with normal node js stuff. Where is the file coming from is it a request or are you searching for it in a directory, etc.. ? – user2879041 Jun 19 '15 at 19:08
  • @user2879041 - please see my edited question – Jose Ramirez Jun 19 '15 at 19:35
  • The filenames are in `files[i]`, unsure what you are asking... – Ruan Mendes Jun 19 '15 at 19:37
  • @JuanMendes But they appear as undefined when I try to use them inside the function of the first then() call (where I get the page count). Even though I have them, I can't associate each filename with its corresponding page count. Not with the code as it is right now that is. – Jose Ramirez Jun 19 '15 at 19:42
  • ... well, I've found something: If I try to do `console.log('Number of Pages: ' + numPages + ', filename: ' + files[i]);` it outputs 'Number of Pages: 2, filename: undefined' because inside the anonymous function, the i used as index is always 2, so this seems to be something of a scope or js closure issue. – Jose Ramirez Jun 19 '15 at 20:17

1 Answers1

2

I fixed it :) the code looks like this now:

var fs = require('fs');
var glob = require('glob');

global.window = global;
global.navigator = { userAgent: "node" };
global.PDFJS = {};
global.DOMParser = require('./domparsermock.js').DOMParserMock;

require('../../build/singlefile/build/pdf.combined.js');
glob("**/*.pdf", function (er, files) {

//this is the essential change: use a forEach() instead of the for loop
files.forEach(function(file){
    var data = new Uint8Array(fs.readFileSync(file));
    PDFJS.getDocument(data)
      .then(function (doc) {
        var numPages = doc.numPages;
        console.log('File name: ' + file + ', Number of Pages: ' + numPages);
        console.log();
      });
  });
});

Hope it helps someone, and thanks for the quick replies :)

Jose Ramirez
  • 381
  • 6
  • 20
  • 2
    The reason it didn't work was `files[i]` inside the loop is because the code inside `then()` is running asynchronously. Therefore, by the time they run, `i` is equal to `files.length` and `files[i]` is undefined. That is, the `i` variable was being shared by all your `then` callbacks. By using a `forEach`, you're creating a separate closure for each file! Good job figuring it out on your own. See http://stackoverflow.com/questions/18560708/javascript-closures-access-in-loop-to-current-i-j-variables – Ruan Mendes Jun 19 '15 at 20:43
  • Off-topic: you could use the *asynchronous* `fs.readFile` and "promisify" your whole pipeline: start with a promise for reading the file, `.then` make a `Uint8Array` and parse the PDF document, `.then` process the document. Right now, it's a bit weird to have synchronous file reads with asynchronous processing. – Mattias Buelens Jun 19 '15 at 20:51