2

I have an issue using promises with Node.js. I am using cheerio and request-promise to do web scraping, and I want to write my results to a CSV file once all asynchronous functions have been executed, using the Promise.all(promises).then(...) syntax. It was working fine but suddenly the program finishes without any error or rejects but without executing the then(...) part (no file and no log neither). Here is my code:

const rp = require('request-promise');
const cheerio = require('cheerio');
const csv = require('fast-csv');
const fs = require('fs');

var results = [];
var promises = [];

function getResults(district, branch) {

    for (let i = 65; i <= 90; i++) {

        let letter = String.fromCharCode(i);

        let generalOptions = {
            uri: 'https://ocean.ac-lille.fr/publinet/resultats?previousValCri1=' + branch + '&previousValCri0=0' + district + '&previousIdBaseSession=pub_24&actionId=6&valCriLettre=' + letter,
            transform: function (body) {
                return cheerio.load(body);
            }
        };

        promises.push(new Promise(function(resolve, reject) {
            rp(generalOptions)
                .then(($) => {

                    $('.tableauResultat').find('tr').each(function(i, elem) {
                        let children = $(elem).children('td');
                        let name = $(children[0]).text();

                        results.push({name: name, result: 42, branch: branch});
                        resolve();

                    });
                })
                .catch((err) => {
                    console.log(err);
                    //reject();
                });
            }
        ));
    }
}

getResults(process.argv[2], process.argv[3]);


Promise.all(promises).then(() => {
    console.log("Finished!");
    var ws = fs.createWriteStream('results_bis.csv', {flag: 'w'});
    csv.write(results).pipe(ws);
}).catch((err) => {
    console.log(err);
});
Tom Février
  • 45
  • 1
  • 8
  • I believe your node application might be ending without waiting for the promises to resolve. Try one of these solutions https://stackoverflow.com/questions/44137481/prevent-nodejs-program-from-exiting – Ryan C Jul 20 '18 at 10:07
  • I have no problem with your code, except `someObject is not defined`. Node v8.11.1, npm 5.6.0 – Andrew Paramoshkin Jul 20 '18 at 10:08
  • @AndrewParamoshkin Yes that's because I simplified my code before posting it: someObject is an object created from the data scraped by cheerio :) – Tom Février Jul 20 '18 at 10:42
  • @RyanC I used the `process.stdin.resume();` solution and now my program never finishes (but `then(...)` is still not executed)... It's weird because all the promises seem to resolve when I log the data pushed to `results` – Tom Février Jul 20 '18 at 11:02
  • No console output at all? It's hard to believe. There should be at least some. console.log isn't an efficient way to debug scripts, you can't see empty entries. You don't need `process.stdin.resume()`. If there are pending promises, the program won't exit. You're rejecting with empty error `reject()`, you don't need `new Promise` there at all. this is promise construction antipattern. Also `, function(err) { console.log(err); })` won't catch the errors that happen in `then`, use `catch`. Please, provide https://stackoverflow.com/help/mcve that can replicate the problem. – Estus Flask Jul 20 '18 at 11:13
  • @estus It might be hard to believe but it's true: I don't have any console output or error message, even when using `catch` and deleting `reject()`... What do you mean when you say I don't need `new Promise`? Do you know another way to execute a function when all promises are resolved? – Tom Février Jul 20 '18 at 11:38
  • I have edited my code so that the problem can be replicated. To test it, try any combination of 59/62 and S/ES/L as arguments – Tom Février Jul 20 '18 at 11:45
  • It outputs numerous `start` and then `Finished!` with `59` and `S` args. `$('.tableauResultat').find('tr').each(function(i, elem) {` is quite messy - it doesn't work well with promises. Generally, you don't need `results` array at all - all results should become available in `Promise.all then`. – Estus Flask Jul 20 '18 at 11:51
  • @estus It doesn't output `Finished!` for other combinations though, I don't know why... About the `results` array, I need it because `csv.write` takes an array of objects as an argument... How is the `$('.tableauResultat').find('tr').each(function(i, elem)` function messy? It's not my fault if the webpage I am scrapping is messy lol – Tom Février Jul 20 '18 at 12:27
  • It's not the page is messy but the way you process results. If you have particular combination you have problem with, you need to specify it in the question. The question should contain all details that are needed to replicate the problem. You could use values directly instead of process.argv for that matter. – Estus Flask Jul 20 '18 at 13:55

1 Answers1

1

results array is usually an antipattern when being used with Promise.all. It's expected that promises that are passed to Promise.all return necessary results, so it's possible to access them as Promise.all(promises).then(results => { ... }).

There is no need for callback-based processing, $('.tableauResultat').find('tr').each(...), it results in poor control flow. Since Cheerio provides jQuery-like API, results can be converted to array and processed in a way that is idiomatic to vanilla JavaScript.

The code above uses promise construction antipattern. There's no need for new Promise if there's existing one rp(generalOptions). It contributes to the problem; Node.js exists when there's nothing scheduled for execution. Some of promises in promise are pending without a chance for them to be resolved because neither resolve nor reject is called. This happens if each callback is never triggered. The problem could be solved by moving resolve() outside each callback.

A more straightforward way to do this that leaves much less place for obscure problems and is easier to debug:

const promises = [];

function getResults(district, branch) {
    for (let i = 65; i <= 90; i++) {
        ...
        promises.push(
            rp(generalOptions)
            .then(($) => {
                const trs = $('.tableauResultat').find('tr').toArray();
                const results = trs.map(elem => {
                    let children = $(elem).children('td');
                    let name = $(children[0]).text();

                    return {name, result: 42, branch};
                });

                return results;
            })
            .catch((err) => {
                console.log(err);
                return null; // can be filtered out in Promise.all results
                // or rethrow an error
            })    
        );
    }
}

getResults(...);

Promise.all(promises).then(nestedResults => {
    const results = nestedResults.reduce((flatArr, arr) => flatArr.concat(arr), []);
    console.log("Finished!");
    var ws = fs.createWriteStream('results_bis.csv', {flag: 'w'});
    csv.write(results).pipe(ws);
}).catch((err) => {
    console.log(err);
});

Notice that errors from file streams are currently not handled. It's unlikely that Promise.all catch will ever be triggered, even if there are problems.

Estus Flask
  • 206,104
  • 70
  • 425
  • 565