0

I'm implementing a web scraper using NodeJS with the Request and Cheerio libraries. I'm trying to save the scraped URL links into an array, however, for some reason my array ends up becoming undefined when I attempt to export it.

The console.log(url_dict) towards the end prints the data to terminal, but if I export the module to another .js file and print it to terminal (with console.log), I get an undefined error.

Any thoughts? Thanks so much for your time! :)

var request = require('request');
var cheerio = require('cheerio');

var senatorlist = 'http://en.wikipedia.org/wiki/Seniority_in_the_United_States_Senate';

var url_dict = [];

function lister() {
    request(senatorlist, function(err, resp, body) {
        if (err)
            throw err;
        var $ = cheerio.load(body);
        $('table.wikitable tr a').each(function(i, link){
            url_dict.push($(link).attr('href'));
        });
        console.log(url_dict); 
    });
}
techalicious
  • 443
  • 1
  • 6
  • 14
  • 1
    "export the module to another .js" don't see you exporting anything here, are we missing some code? – nowk Jun 11 '14 at 04:22
  • @kwon, the other .js is a line with require and a console.log. Just updated the origin post to mention that! – techalicious Jun 11 '14 at 04:35

2 Answers2

1

If you other app just has a require and a log it apparently isn't waiting for the data from your lister() function or even calling it. Send a callback to your lister function:

function lister(callback) {
    request(senatorlist, function(err, resp, body) {
        if (err)
            throw err;
        var $ = cheerio.load(body);
        $('table.wikitable tr a').each(function(i, link){
            url_dict.push($(link).attr('href'));
        });
        console.log(url_dict);
        callback(url_dict);
    });
}

And in your other js file:

lister(function(url_dict) {
    console.log('other js url_dict:', url_dict);
});
Jason Goemaat
  • 28,692
  • 15
  • 86
  • 113
  • This worked! Thanks :). If you have the time, could you explain this concept in a little more detail or point me to a few resources on it? I'm pretty new to NodeJS – techalicious Jun 11 '14 at 05:00
  • Basically when you make an asynchronous call (anything requiring a callback function), code execution does not wait and continues on, so you're data is still waiting to be loaded and the rest of your program could finish. Google some combinations of the words asyncrhonous, event-driven, javascript and node.js. Here's [another question](http://stackoverflow.com/questions/6898779/how-to-write-asynchronous-functions-for-node-js) about it. – Jason Goemaat Jun 11 '14 at 05:12
0

API CALL

app.get("/someurl",req,res){
    request(senatorlist, function(err, resp, body) {
        if (err)
            throw err;
        var $ = cheerio.load(body);
        $('table.wikitable tr a').each(function(i, link){
            url_dict.push($(link).attr('href'));
        });
        console.log(url_dict);
        res.json(url_dict); 
    });
}

Client Side

$http.get("/someurl").success(function(data,status,headers,config){
    console.log("success",data);
    $scope.items=data;
}).error(function(data,status,headers,config){
        console.log("error",data);
    });
};

Try doing this on server and pass this json to show data. Hope this solves your query. :)

tadman
  • 208,517
  • 23
  • 234
  • 262
Vaibhav Magon
  • 1,453
  • 1
  • 13
  • 29
  • Thanks for your reply! Getting a few errors though: Since you're using app, should there be a require('express')? Also, does the .get need a callback function or can I just put the req, res directly in as parameters? – techalicious Jun 11 '14 at 04:42