3

I want to make http requests to an API-s to collect for each user it's data and insert into mongodb.

The problem I am having is, it is doing all the requests at once, and seems it gets stuck somewhere and I don't know what is going on.

Al thou I am using async library and add the request() method inside each iteration, and I dont know if this is the right way, here is the code:

  function iterateThruAllStudents(from, to) {
    Student.find({status: 'student'})
        .populate('user')
        .exec(function (err, students) {
            if (err) {
                throw err;
            }

            async.forEach(students, function iteratee(student, callback) {
                if (student.worksnap.user != null) {
                    var options = {
                        url: 'https://api.worksnaps.com/api/projects/' + project_id + '/time_entries.xml?user_ids=' + student.worksnap.user.user_id + '&from_timestamp=' + from + '&to_timestamp=' + to,
                        headers: {
                            'Authorization': 'Basic bGhNSVJkVUFwOE1DS2loOFVyZkFyOENEZEhPSXdCdUlHdElWMHo0czo='
                        }
                    };
                    request(options, getTimeEntriesFromWorksnap);
                }
                callback(); // tell async that the iterator has completed
            }, function (err) {
                console.log('iterating done');
            });
        });
}

    function getTimeEntriesFromWorksnap(error, response, body) {
        console.log(response.statusCode);
        if (!error && response.statusCode == 200) {
            parser.parseString(body, function (err, results) {
                var json_string = JSON.stringify(results.time_entries);
                var timeEntries = JSON.parse(json_string);
                _.forEach(timeEntries, function (timeEntry) {
                    _.forEach(timeEntry, function (item) {
                        saveTimeEntry(item);
                    });
                });
            });
        }
    }

    function saveTimeEntry(item) {
        Student.findOne({
                'worksnap.user.user_id': item.user_id[0]
            })
            .populate('user')
            .exec(function (err, student) {
                if (err) {
                    throw err;
                }
                student.timeEntries.push(item);
                student.save(function (err) {
                    if (err) {
                        console.log(err);
                    } else {
                        console.log('item inserted...');
                    }
                });

            });
    }

var from = new Date(startDate).getTime() / 1000;
startDate.setDate(startDate.getDate() + 30);
var to = new Date(startDate).getTime() / 1000;
iterateThruAllStudents(from, to);

I am new to JavaScript, especially when dealing with async.

Any help?

Lulzim Fazlija
  • 865
  • 2
  • 16
  • 37

2 Answers2

2

Use Async.eachLimit() to make batched request to the api...Try this iterateThruAllStudents() function.

I already had same question before here

See tutorial of limiting here. Though i am making the limit as 5 but you can do whatever you want(10,50 etc).

function iterateThruAllStudents(from, to) {
  Student.find({status: 'student'})
    .populate('user')
    .exec(function (err, students) {
      if (err) {
        throw err;
      }
      async.eachLimit(students,5,function iteratee(student, callback) {
        if (student.worksnap.user != null) {
          var options = {
            url: 'https://api.worksnaps.com/api/projects/' + project_id + '/time_entries.xml?user_ids=' + student.worksnap.user.user_id + '&from_timestamp=' + from + '&to_timestamp=' + to,
            headers: {
              'Authorization': 'Basic bGhNSVJkVUFwOE1DS2loOFVyZkFyOENEZEhPSXdCdUlHdElWMHo0czo='
            }
          };
          request(options,getTimeEntriesFromWorksnap(callback));
        }
      }, function (err) {
        console.log(err);
        console.log('iterating done');
      });
    });
}

function getTimeEntriesFromWorksnap(cb) {
  return function(error, response, body){
    console.log(response.statusCode);
    if (!error && response.statusCode == 200) {
      parser.parseString(body, function (err, results) {
        var json_string = JSON.stringify(results.time_entries);
        var timeEntries = JSON.parse(json_string);
        async.each(timeEntries,function(timeEntry,cb1){
          async.each(timeEntry,function(item,cb2){
            saveTimeEntry(item,cb2);
          },function(err){
            if(err)
              cb1(err);
            else
              cb1();
          })
        },function(err){
          if(err)
            cb(err);
          else
            cb();
        });
        //_.forEach(timeEntries, function (timeEntry) {
        //  _.forEach(timeEntry, function (item) {
        //    saveTimeEntry(item);
        //  });
        //});
      });
    }
    cb(null);
  }
}

function saveTimeEntry(item,cb2) {
  Student.findOne({
      'worksnap.user.user_id': item.user_id[0]
    })
    .populate('user')
    .exec(function (err, student) {
      if (err) {
        return cb2(err);
      }
      student.timeEntries.push(item);
      student.save(function (err) {
        if (err) {
          console.log(err);
          //return cb2(err);//Do it if you wanna throw an error.
        } else {
          console.log('item inserted...');
        }
        cb2();
      });
    });
}

var from = new Date(startDate).getTime() / 1000;
startDate.setDate(startDate.getDate() + 30);
var to = new Date(startDate).getTime() / 1000;
iterateThruAllStudents(from, to);
vkstack
  • 1,582
  • 11
  • 24
1

In your example you missed iteratee param in the each method of async - iteratee(item, callback). Look at this example here.

You need to call callback each time inside your iteratee function to tell async continue doing its processing.

each(collection, iteratee, [callback])

  • collection - collection to iterate over.

  • iteratee(item, callback) - function to apply to each item in coll. The iteratee is passed a callback(err) which must be called once it has completed. If no error has occurred, the callback should be run without arguments or with an explicit null argument. The array index is not passed to the iteratee. If you need the index, use forEachOf.

  • callback(err) - Optional callback which is called when all iteratee functions have finished, or an error occurs.

If you need synchronous behavior, no probs! There is also eachSeries method with the same signature except every collection item will be iterated synchronously.

UPDATE:

Changes should be implemented:

Pass async callback:

request(options, getTimeEntriesFromWorksnap(callback));

Return necessary for request callback function:

function getTimeEntriesFromWorksnap(callback) {
  return function(error, response, body) { 
    // ...       
    saveTimeEntry(item, callback);                       
    // ...
  }        
}

Call callback only after record is saved in database:

function saveTimeEntry(item, callback) {
  // ..
  student.save(callback);
  // ..
}

Refactor nested loops (not sure what timeEntries, timeEntry are, so use appropriate async method to iterate these data structures):

async.each(timeEntries, function (timeEntry, callback) {
   async.each(timeEntry, function (item, callback) {
       saveTimeEntry(item, callback);
   }, callback);
}, callback);
oleh.meleshko
  • 4,675
  • 2
  • 27
  • 40
  • 1
    I have updated my code, could you have a look and let me know if it's fine? – Lulzim Fazlija Apr 08 '16 at 22:14
  • 1
    Al thou, I can still see it's kinda making async requests.... I want to go thru all student , take all date once done, continue with next one, – Lulzim Fazlija Apr 08 '16 at 22:15
  • 1
    @HisniFazlija better, but not good yet. `request` is asynchronous, so you need to add the call of `callback()` right after response will be received (for every request witin your `async.each`). Look here: https://www.npmjs.com/package/request for more details. – oleh.meleshko Apr 08 '16 at 22:25
  • 1
    I dont see any option where can I add callback() func within request()... Could you give me a sample? thnx – Lulzim Fazlija Apr 08 '16 at 22:29
  • 1
    can I do smth like this: var r = request(options, getTimeEntriesFromWorksnap); r.on('response', function(response) { callback(); // tell async that the iterator has completed }) – Lulzim Fazlija Apr 08 '16 at 22:34
  • I get error, saying , Callback was already called, async.js:43, after I did changes according to ur new update – Lulzim Fazlija Apr 08 '16 at 22:52
  • @HisniFazlija ah, I see. The reason is your nested `_.forEach` inside `getTimeEntriesFromWorksnap`. You need to call callback once per iteration for the main `each` loop. In your case you need to replace these `_.forEach` loops with async methods accordingly. – oleh.meleshko Apr 08 '16 at 22:57
  • I updated those _forEach to: async.forEach(timeEntries, function iteratee(timeEntry, callback) { async.forEach(timeEntry, function iteratee(item) { saveTimeEntry(item,callback); }); }); still getting the same error – Lulzim Fazlija Apr 08 '16 at 23:02
  • It still does not seem working....got stuck somewhere, nothing is inserting.... is there any way that I can see where is it stuck the execution? – Lulzim Fazlija Apr 08 '16 at 23:45
  • well, I will try gathering the data for 1 day, and see if it works... maybe the json for 30 days is too big and it get's stuck... thnx – Lulzim Fazlija Apr 08 '16 at 23:49
  • When I try to print console.log(response.statusCode); within getTimeEntriesFromWorksnap(callback), I can see it being displayed lots of time at once, which means it is making lots of calls at once, right? Which I would want one call at a time... – Lulzim Fazlija Apr 09 '16 at 00:04
  • ok, I have been waiting for almost a hour to see what's happening after that stuck state, and I see it has return this: FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory – Lulzim Fazlija Apr 09 '16 at 01:52