0

I'm currently scraping data from a webpage and then pushing it to an array, the code currently looks like this

url = //url 
let data = [];

request(url, function (err, response, html) {

if (!err) {
var $ = cheerio.load(html);

$('#id').each(function (i, element) {
    data_element = //(this).find...

    data.push(data_element);

});

} 

console.log(data); //console logs the data inside the request

}) 

console.log(data); //logs empty array outside of request

The data is logged when I call console log inside the request, but if I call it outside of the request function then it returns an empty array. I know I need to use a callback function but I was wondering what the best way to go about this is, as I will be making multiple requests inside my function.

adevh
  • 103
  • 4
  • 12
  • Try [Async](https://caolan.github.io/async/) utility – Brahma Dev Oct 23 '17 at 10:33
  • You can use `Promise`. – ricky Oct 23 '17 at 10:35
  • You can write a function and then call them inside your request. – edkeveked Oct 23 '17 at 10:35
  • 1
    Possible duplicate of [Why is my variable unaltered after I modify it inside of a function? - Asynchronous code reference](https://stackoverflow.com/questions/23667086/why-is-my-variable-unaltered-after-i-modify-it-inside-of-a-function-asynchron) – JJJ Oct 23 '17 at 10:38
  • Promise is what you need. Or create a callback function that you can always call after the foreach iterations – olyjosh Oct 23 '17 at 10:49

2 Answers2

0

You should use promises instead of callback functions. A good promise library is Bluebird

You can find an example of using Promise instead of callback here

Each function should return a promise. You need to "wait" for all promises to finish by using Promise.all (link to documentation). Then, you can write it all to your log

Here's an example:

const rp = require('request-promise'); // A package for Request with Bluebird promises 
const Promise = require('bluebird');

const url = //url 
let data = [];

const options = {
    uri: url
};

const p1 = rp(options).then((response) => {
    var $ = cheerio.load(response.body);

    $('#id').each(function (i, element) {
        data_element = //(this).find...
        data.push(data_element);
    });

    console.log(data); //console logs the data inside the request

    return data; // this data will be available on results parameter on Promise.all 
});


const p2 = // another request

Promise.all([p1, p2]).then((results) => {
    console.log(results) // print whatever you want
})
Guy Segev
  • 1,757
  • 16
  • 24
-1

I think its because "let" type of variable stops it. change it to var and it should be working.

NashPL
  • 461
  • 1
  • 4
  • 19
  • 2
    It is not a matter of `let` or `var`. It is more related to javascript being asynchronous. – edkeveked Oct 23 '17 at 10:37
  • This should only be if there is another var define in the inner scope block as data. That is using "let data=[]" and using "var data = somevalue" in the child block.This will avoid mix up of variable where the data will be reference base on block level scope of it call hence respecting the one in the child code block – olyjosh Oct 23 '17 at 10:47