0

Below is my code. I init each array element with its index. Then I use async.each to iterate over the array and call to retrieve url contents. Request timeout is set to 500ms.

var async = require('async');
var request = require('request');
var logger = require('log4js').getLogger();

var url = "http://www.wordpress.com";

var arr=new Array(100);
for ( var i=0; i<arr.length; i++){ arr[i]=i; }
async.each(arr, function(a, cb) {
  var ts1 = (new Date()).getTime();
  request(url, {timeout: 500}, function( err, res, body ) {
    var ts2 = (new Date()).getTime();
    logger.debug(`a=${a}, dt=${ts2-ts1}`);
    if ( err ) {
      logger.debug(`Error: ${err}, dt=${ts2-ts1}`);
      return cb(null);
    }
    else {
      //logger.debug(`OK: ${a}`);
      cb(null);
    }
  });
},
function( err, result) {
});

When array size is 100 I get 12 timeout errors:

[root@njs testreq]# node main.js | grep ETIME | wc -l
12
[root@njs testreq]# 

When array size is 1000 I get 1000 timeout errors:

[root@njs testreq]# node main.js | grep ETIME | wc -l
1000
[root@njs testreq]# 

What is the cause? How can I avoid it?

rlib
  • 7,444
  • 3
  • 32
  • 40

1 Answers1

1

Timeout Problem

There are a couple problems here.

  1. You are setting an agressive timeout, so it makes sense that the requests are timing out. The longer I made the timeout, the less requests timed out. When I removed the timeout, I got a 0% failure rate on up to 10 000 parellel requests (though 10000 took quite a while to finish).
  2. Your code is not being a very good internet citizen. Making 1000 or more parallel requests to a webserver is basically a mini DDOS attack. You should try to spread your requests over a longer period of time to give the webserver a more steady, even workload.

Code Clarity Improvements

There are also a couple things I noticed in your code that could be improved.

Array.from

If you want to create an array with 100 elements, you dont't have to do

var arr = new Array(100)
for (var i = 0; i < arr.length; i < 0) { arr[i] = i }

You can replace this with

var arr = Array.from({length: 100}, (v, k) => k)

See Array.from for more info

Date.now

var timestamp = (new Date()).getTime()

can be replaced with

var timestamp = Date.now()

or

var timestamp = +Date()

See Date.now for more info

BlackMamba
  • 10,054
  • 7
  • 44
  • 67
JoshWillik
  • 2,624
  • 21
  • 38
  • 1. No timeout is inacceptable: some requests took more than 11 sec to complete when array size is 1000 2. That's not a DDOS, these are requests to RTB server (not wordpress in reali app) so it should support the load I assume. The question is if there is somthng that I can tune on my side to avoid the timeouts. 3. Thanks for the Arry.from() :) – rlib Mar 15 '16 at 01:59
  • Unfortunately requests will take as long as they take. Either you can wait long enough to get the answer, or you can hang up early. There's no way to force the server on the other site to work through the traffic faster than it can. – JoshWillik Mar 15 '16 at 02:05
  • I understand that I cannot cause the remote server to work faster from my side. But the servers I use to connect to are RTB advertising platforms that should stand with such load (see section for RTB at http://tech.adroll.com/ for example). So I assume that's me who does something incorrect. – rlib Mar 15 '16 at 02:12
  • If the data hasn't come back by the time you hang up, that's all there is too it. They didn't get back to you in time. There's not much more I can say. The code you posted looks to simple to have any more sinister bugs hiding inside it. – JoshWillik Mar 15 '16 at 02:15
  • 3
    Wel, the problem is with NodeJS settings and it was solved here: http://stackoverflow.com/questions/36018208/nodejs-request-timeouts-with-concurrency-100 – rlib Mar 16 '16 at 16:25
  • @rlib, I learned something new today. Thanks for updating me with that answer! – JoshWillik Mar 16 '16 at 21:58