0

I am having issues with how to handle data in an async fashion with Promises.

I have created a function that takes in a PDF stream and using pdf.js parses the document, pages and annotations to extract the values I need in an Array format.

The issue is that after I loop through the pages, and each annotation element, I am unable to update the outer scope variable.

The goal is to have dataArray populated with the records of each element of each page in the PDF document.

My codes is as below:

const pdfjs = require('pdfjs-dist');

function parsePdf(data) {
    let dataArray = [];
    pdfjs.getDocument(data)
      .then(pdf => {
        for (let i = 1; i <= pdf.numPages; i++) {
          pdf.getPage(i)
            .then(page => {
              page.getAnnotations()
                .then(items => {
                  items.forEach(element => {
                    let obj = {};
                    if (element.fieldName) {
                      obj.id = element.id;
                      obj.name = element.fieldName;
                      obj.value = element.fieldValue;
                      obj.isCheckbox = element.checkBox;
                      obj.isRadioButton = element.radioButton;
                      obj.isReadOnly = element.readOnly;
                      obj.type = element.fieldType;
                      dataArray.push(obj);
                    }
                  });
                // Return values See example 1 below
                console.log(dataArray);
              });
            });           
          }
     });
     // Returns Empty Array []
     console.log(dataArray);
     return dataArray;
}

I'd like to understand I I can get the values returned in the inner forEach scope to the outer variable in the function. I know it has to do with how Promises are handling the response but I just can't figure out how to chain the .then() code.

Example of what comes back in each console.log(dataArray)

[ { id: '59R', name: 'Name_es_:signer:signature', value: 'Jon Doe', isCheckbox: undefined, isRadioButton: undefined, isReadOnly: false, type: 'Tx' }, { id: '62R', name: 'Address', value: '1 Main Street', isCheckbox: undefined, isRadioButton: undefined, isReadOnly: false, type: 'Tx' } ]

Thank you for your help!

Zvika Badalov
  • 193
  • 2
  • 2
  • 11
  • pdfjs.getDocument(data) is asynchronous - you are returning the result synchronously - see https://stackoverflow.com/questions/23667086/why-is-my-variable-unaltered-after-i-modify-it-inside-of-a-function-asynchron – Jaromanda X Oct 04 '17 at 16:14
  • Return `dataArray` inside the `then` function, and return the Promise from `parsePdf` (`return pdfjs.getDocument(data).then(...)`) and finally use the Promise result from `parsePdf`: call it with `parsePdf(...).then(...)` – apsillers Oct 04 '17 at 16:18
  • @apsillers, maybe I don't follow, but just to confirm: `dataArray` is returned at which `then()` the annotation, page, or document? – Zvika Badalov Oct 04 '17 at 16:42
  • @tbadlov Ahh, I see -- the idiomatic Promise solution is harder than I suggest above because you have nested `then` calls. You can still add a callback to `function parsePdf(data, callback) { ...`, and call `callback(dataArray)` when `dataArray` is visible, and call the modified function with `parsePdf(something, function(finalResult) { ... })` – apsillers Oct 04 '17 at 17:08
  • @apsillers, ok, at what point will dataArray will be ready though? – Zvika Badalov Oct 04 '17 at 17:52
  • @tbadlov Where you successfully `console.log` it, right? Just call the `callback` at that point. – apsillers Oct 04 '17 at 17:53
  • By the way, here's my attempt at a a more Promise-idiomatic solution using `Promise.all`: https://jsfiddle.net/65o2Lbcw/ I haven't tested it, though. Basically, get all your `getPage` promises in an array (which gets passed into `Promise.all`), and make sure you `return` your `getAnnotations` promises, so the `all` doesn't resolve until all the child `getAnnotations` promises resolve. Finally, tack a `then` onto the `all` call that return `dataArray`. – apsillers Oct 04 '17 at 18:05
  • @apsillers, thank you so much, that did it. Now I can investigate why it worked which is really what I am trying to achieve! – Zvika Badalov Oct 04 '17 at 18:29
  • Did the Promise-based solution (from the jsfiddle link) work, or did you use the callback solution that I posted in the comments? (Just curious if my untested Promise solution works or not, if you used it.) – apsillers Oct 04 '17 at 18:59
  • Your solution worked. Had to cleanup a bit, but yeah the promise collection did the trick. – Zvika Badalov Oct 04 '17 at 19:55

0 Answers0