-3

I'm slightly confused about how to use promises. I've read a few things on promises mainly because it seems like I have to use them. I'm working on a small application which is supposed to search through some pdfs using pdfjs, and they use promises. I knocked up something in nodejs, by looking at various examples on the net but I run into a problem.

Let's look at the code first:

require('pdfjs-dist');
var fs = require('fs');

//var searchTerm = "course";
var searchTerm = "designee";
//var searchTerm = "document";
var wordCounter = 0;
var searchResultJSON = [];
//var data = new Uint8Array(fs.readFileSync('testPdf.pdf'));
//var data = new Uint8Array(fs.readFileSync('advanced-javascript.pdf'));
var data = new Uint8Array(fs.readFileSync('iss4.pdf'));
PDFJS.getDocument(data).then(function (pdfDocument) {
  console.log('Number of pages: ' + pdfDocument.numPages );
  //var div = document.getElementById('viewer');
  for(var i = 1; i<=pdfDocument.numPages; i++ ){//loops thru pages
      console.log("i is " + (i));
      pdfDocument.getPage((i)).then(function(page){//get page(i), 
         // console.log("page is " + (i));
          //console.log("inside getPage()");
          page.getTextContent().then( function(textContent){//get content of pdf
            //console.log("inside getTextContent()");  
            //if( null != textContent.items ){
                var page_text = "";
                var last_block = null;
                var lineWithResult = "";

                for( var k = 0; k < textContent.items.length; k++ ){
                    var block = textContent.items[k];
                    //console.log("word " + textContent.items.length + " k is " + k );
                    /* if( last_block != null && last_block.str[last_block.str.length-1] != ' '){
                        if( block.x < last_block.x )
                            page_text += "\r\n"; 
                        else if ( last_block.y != block.y && ( last_block.str.match(/^(\s?[a-zA-Z])$|^(.+\s[a-zA-Z])$/) == null ))
                            page_text += ' ';
                    } */

                    page_text += block.str;

                    last_block = block;
                    lineWithResult = searchPdf(block.str);
                    if(lineWithResult != null){
                        console.log(lineWithResult + " wordCounter is " + wordCounter);

                    }

                }//end of for(var k...)
                    //console.log(" page_text " + page_text);
                    //console.log(searchResultJSON);

            //}
          });//end of textContent.items

      });//end of getPage

  }//end of loop      
});
function searchPdf(toSearch){//searching pdf for searchTerm
    var result = toSearch.toLowerCase().indexOf(searchTerm);
    if(result >=0){//if match is found
        wordCounter++;
        //console.log("toSearch " + toSearch + " result is " + result + " wordCounter " + wordCounter);
        constructResult(toSearch, result);//build the result object
        return toSearch;
    }
    else{//if match not found
        return null;
    }

}
function constructResult(toSearch, result){//construct array of objects containing: search term, search result and index of search term
    searchResultJSON.push({
        "TextLine":toSearch,
        "SearchTerm":searchTerm,
        "Result": result,               
    });     
} 

The purpose of this code is to:

  • loop through the pdf's pages

  • loop through the content

  • get the pdf text in a variable line by line

  • search the pdf content with a keyword

  • if the keyword finds a match, print the match

  • get the matches in a javascript object

So, it all works OK but you'll notice that from inside the second for loop (where I get the text of the pdf that is) I call a function, searchPdf() which basically performs the search and from within that function I call another function constructResult(...) which is supposed to create the javascript object with the results.

I have some problems printing this object though, because if I print it outside the scope of the for loop, it is empty because the print call (in my case the console.log) executes before the loop has actually copied and analysed (read process and found a match) the text. So, promises see the way to resolve the problem. Thing is, I'm not sure how to code this in such a way that I can chain the promises and print my object after everything has executed. Any idea?

EDIT: so to clarify, what I need in sequence is this: 1)loop through pdf (I will have to amend the code to loop through a collection of pdfs at some point soon) 2)get each line of text 3)check that there is a match 4)if so, copy the line of text in the javascript object 5)print the javascript object

antobbo
  • 255
  • 1
  • 4
  • 21
  • Possible duplicate of [Replacing callbacks with promises in Node.js](http://stackoverflow.com/questions/28432401/replacing-callbacks-with-promises-in-node-js) – BrTkCa Nov 17 '16 at 21:33
  • you could create a print function that gets called from inside the loop and return the result to a var outside the loop – Dex Dave Nov 17 '16 at 21:46
  • @DexDave, tried that already, it doesn't work because the variable will be empty, I need all the operations to be executed first, then return the results to that variable – antobbo Nov 17 '16 at 22:08
  • not posting an answer as I'm not really sure what the actual question is, and I'm also not about to "explain how promises work". Having said that [this fiddle](https://jsfiddle.net/na66sfor/) is a minor rewrite of your existing code - if I've done it right, at the point where the comment "all done here" is, you should be able to "print" the `searchResultJSON` object – Jaromanda X Nov 17 '16 at 22:27
  • Thanks @JaromandaX, I'm looking at the code, I think there is a bracket missing somewhere as it returns `SyntaxError: missing ) after argument list`, trying to figure that out. Are those => lambda expressions? – antobbo Nov 18 '16 at 08:39
  • @antobbo - that code will never run in jsfiddle - but having said that, there was an extra `;` that shouldn't be there - update [fiddle](https://jsfiddle.net/na66sfor/1/) ... => are "arrow functions" - [this fiddle](https://jsfiddle.net/na66sfor/2/) is the "old school" javascript for the same code - I thought since you're using node, use the new syntax – Jaromanda X Nov 18 '16 at 08:44
  • oh OK, thanks it works, I will try to understand the construct a bit better now and then try to implement the logic to loop through more than one pdf – antobbo Nov 18 '16 at 08:50

1 Answers1

1

Try something like this:

function search(doc, s) {
    var allPages = [],
        i;

    for (var i = 1; i <= doc.numPages; i++) {
        allPages.push(doc.getPage(i));
    }

    // Promise.all returns a promise that resolves once 
    // each promise inside allPages has been resolved
    return Promise.all(allPages)
    // pages now contains an array of pages, loop over them
    // using map, return the promise to get the content for each page
    // return it through Promise.all so we can run the next phase
    // once the text is resolved for each page
    .then(pages => Promise.all(pages.map(p => p.getTextContent())))
    // We now have an array of contents for each page, filter based
    // on the passed string
    .then(content => content.filter(c => c.indexOf(s) > -1));
}

// This is a mock of the pdf API used in your question
var pdfDocument = {
    numPages: 3,
    getPage: function(i) {
        return Promise.resolve({
            getTextContent: function() {
                return Promise.resolve('Page ' + i);
            }
        });
    }
}
Evan Trimboli
  • 29,900
  • 6
  • 45
  • 66
  • 1
    Even if the code works, OP asked for an explanation. I think it would be useful if you explain what is happening. – Marcs Nov 17 '16 at 21:57
  • Might want to explain that **pdfDocument** here is a mock up of the OP's version. It might also be a good idea to explain things a bit. – JonSG Nov 17 '16 at 21:58
  • While `Promise.all` will be part of the solution, this answer isn't the best use of Promises. There's no need for `new Promise` at all, nor any need for `Promise.resolve` – Jaromanda X Nov 17 '16 at 22:03
  • thanks for the code, but yes I'm still a bit confused about how to use these promises. @JaromandaX touched this problem a bit, the tutorials usually create a new promise but in the pdfjs that doesn't happen, which is rather confusing... – antobbo Nov 17 '16 at 22:10
  • Agreed, there's no need to create a new promise, I didn't refactor it from messing around. Not sure why ypu say `Promise.resolve()` isn't needed. The OP seems to imply `getPage` and `getTextContent` return promises, so I'm trying to mimic that API. – Evan Trimboli Nov 17 '16 at 22:55
  • @EvanTrimboli - yeah, didn't realise the purpose of those Promise.resolve were to mimic – Jaromanda X Nov 18 '16 at 08:47