2

I want to get the following workflow:

  • Get URL of some images
  • Pass those URL to another method to download them, and return the downloaded paths
  • Use those paths to get the images and create a PDF file using pdfkit

After a lot of tinkering, my code looks like this -

nightmare 
    .goto(url) 
    .wait('body') 
    .evaluate( ()=>document.querySelector('body').innerHTML) 
    .end() 
    .then((response) => { 
        return getJSONData(response);
    })
    .then((data) => {
        return processJSONData(data);
    })
    .then((pages) => {
        return createFile(pages);
    }) 
    .catch(err => { 
        console.log(err);  
    });

getJSONData(data) uses cheerio to parse the HTML, and processJSONData(data) uses image-downloader to download the images. From my understanding, since this is a promise chain, it means the execution inside each then() will be asynchronous, but the then() blocks themselves will be executed sequentially. On running the code, I see that even though the first two then() blocks are executed sequentially (up to now), the createFile(pages) block is executed immediately thereby creating a corrupt PDF file. What could possibly be the reason behind this? How can I ensure that the then() blocks are executed synchronously, i.e. each then() runs only after the previous then is resolved?
The complete code can be found here.

Roamer-1888
  • 19,138
  • 5
  • 33
  • 44
srdg
  • 585
  • 1
  • 4
  • 15
  • 2
    your `processJSONData()` is doing async things, but does not wait for their results. You only wait for partial results within the `forEach()` but not for all of them to finish. – Sirko Dec 28 '20 at 08:12
  • How can I wait for them to finish? Any ideas? – srdg Dec 28 '20 at 08:13
  • 1
    It doesn't look like you're returning a promise from processJSONData – badsyntax Dec 28 '20 at 08:14
  • @badsyntax please have a look at the commented block in `processJSONData`. Even if I returned the `promises` variable, it returns immediately with a promise object and still creates a corrupt PDF. – srdg Dec 28 '20 at 08:16

1 Answers1

1

You do not wait for all async operations to finish within processJSONData(), but only for some parts of them. Consider changing as follows:

function processJSONData(data){

    // map() instead of forEach() to get a promise per request
    const reqs = data.map(element => {
        // return the inner promise chain to be collected
        return download.image(element)
        .then( ({filename}) => {
            console.log("Saved to ",filename); 
            return filename;
        });
    });

    // return a promise that waits for all of them to be finished
    return Promise.all( reqs );

}
Sirko
  • 72,589
  • 19
  • 149
  • 183
  • Thank you @Sirko, this answer clears up a lot of doubt. I was returning `reqs` all this while instead of a `Promise.all(reqs)` which was the problem. Just one more question, though, could there be a possible explanation as to why sometimes the downloading hangs on a few particular images? And is there a way to force resolve it? I used [this](https://stackoverflow.com/a/65463384/) but even then the process does not end, and if the countdown is too small, ends immediately. – srdg Dec 28 '20 at 10:35
  • @srdg I have no immediate idea. My guess would be that you're overwhelming the server. Currently all your requests are sent at the same time, so maybe the server blocks you after a few images. Maybe consider something like [bottleneck](https://www.npmjs.com/package/bottleneck) to run only a handful of requests in parallel. – Sirko Dec 28 '20 at 14:36