2

I am trying to build a simple web app scraping a website using nodejs and its 2 modules request and cheerio.

I manage to do it with the following code:

    var printURL=function(url){
    request(url, (function() {
        return function(err, resp, body) {
            if (err)
                throw err;
            $ = cheerio.load(body);

            $('img').each(function(){
                console.log($(this).attr('src'));
            });

        }
    } )());
};

It works fine to print the URL of the pictures on the website but what I am really trying to do here is to create a list of url that I could use outside of the function. I tried it this way but it returns an empty list:

var urlList=[];     
var printURL=function(url){
        request(url, (function() {
            return function(err, resp, body) {
                if (err)
                    throw err;
                $ = cheerio.load(body);

                $('img').each(function(){
                    urlList.push($(this).attr('src'));
                });

            }
        } )());
    };

How can I fix this? Many thanks

Spearfisher
  • 8,445
  • 19
  • 70
  • 124

1 Answers1

3

You need to wait until all callbacks are done.

var urlList=[];     
var printURL=function(url){
    request(url, (function() {
        return function(err, resp, body) {
            if (err)
                throw err;
            $ = cheerio.load(body);
            var images = $('img');
            var counter = images.length;
            images.each(function(){
                urlList.push($(this).attr('src'));
                counter--;
                if (counter==0) {
                    // now we have all images!!
                    console.log(urlList);
                }
            });

        }
    })());
};

This is part of the asynchronous nature of node.js. If things get more complicated I would recommend you to use a flow control library like async.

TheHippo
  • 61,720
  • 15
  • 75
  • 100
  • The code works fine, thanks a lot! However Im not sure I get how what makes it works. Can you please guide me through the logic a bit further? Thanks a lot – Spearfisher Feb 27 '14 at 15:33
  • The code is not executed in the order you have written it down. Asynchronous function are called when node finished the job it had to do. You might find some usefull links here: http://stackoverflow.com/questions/2353818/how-do-i-get-started-with-node-js – TheHippo Feb 27 '14 at 15:35