0

I have a function that:
1. gets an array of 3000 'id' properties from mongoDB documents from collection foo.
2. Creates a GET request for each ID to get 'resp' obj for id, and stores it in another database.

router.get('/', (req, res) => {

    var collection = db.get().collection('foo');
    var collection2 = db.get().collection('test');
    collection.distinct('id',  (err, idArr) => { // count: 3000+
    idArr.forEach(id => {
    let url = 'https://externalapi.io/id=' + id
    request(url, (error, response, body) => {
           if (error) { 
             console.log(error) 
           } else {
             resp = JSON.parse(resp);
             collection2.insert(resp);
           }
    });
});

Node Error Log:

[0] events.js:163
[0]       throw er; // Unhandled 'error' event
[0]       ^
[0]
[0] Error: connect ETIMEDOUT [EXT URL REDACTED]
[0]     at Object.exports._errnoException (util.js:1050:11)
[0]     at exports._exceptionWithHostPort (util.js:1073:20)
[0]     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1093:14)

I am using simple-rate-limiter not to cause rate limits (25cps):

const limit = require("simple-rate-limiter");
const request = limit(require("request")).to(20).per(1000);

But anywhere between 300-1700 requests I receive this error which crashes node on the command line. How can I handle this error to prevent my app from crashing?

I have tried a lot of error handling, but none of them were able to handle connect ETIMEDOUT

Moshe
  • 2,583
  • 4
  • 27
  • 70
  • 1
    I will link you to a previous answer of mine: https://stackoverflow.com/questions/29812692/node-js-server-timeout-problems-ec2-express-pm2/43806215#43806215 – arboreal84 May 24 '17 at 20:14
  • 1
    How many requests are you running at the same time to the same host? My guess is you are running too many which is either a problem for your server or for the host you're trying get data from. – jfriend00 May 24 '17 at 20:14
  • I am trying to run 10 requests a second, for 2750 ids. The point is not to throttle, as I have divide and conquer method I'm integrating. I'm just trying to catch the error, and continue the program, while recording the `id` to a failureArr so I can run script with the arr. @jfriend00 – Moshe May 24 '17 at 20:19
  • 1
    Well ETIMEDOUT means something couldn't keep up with the pace you were throwing at it. You will have to slow down the number of requests, increase the relevant timeout time or some combination of both. We would need more info on where the error is occurring to know how you could handle the error more gracefully. FYI, sending requests too fast, causing errors because of that, recording which things failed and then trying them later is not good for anybody. It's likely not your max throughput and purposely running a system beyond its capacity and forcing errors is likely to cause other issues. – jfriend00 May 24 '17 at 20:25
  • You also need to figure out if the timeout error is coming from your call to `request()` or from a database call. You need to know which area code you should be looking for the problem in. You may also want to read [this timeout section](https://www.npmjs.com/package/request#timeouts) in the `request()` doc. – jfriend00 May 24 '17 at 20:32
  • @jfriend00 the timeout is coming from request() call based on the fact that the IP address followed by `Error: ETIMEDOUT` is the proxy server of my external API request. So that's figured out. Secondly, I believe that I am still sending requests too fast despite using the rate-limiter; in other words - my external API (3rd party company) has a rate limit on calls per second and day, however there could be another 'flag' to trigger a rate limit based on my experience with this 3rd party. I read the doc, and I don't really understand it – Moshe May 24 '17 at 20:39
  • 1
    I'm not familiar with that rate limiter. There are a zillions ways for a host to decide to implement rate limiting. A simple requests/sec is only one way. You could be only sending 10 requests/sec, but if some of the requests take more than 0.1 sec to process, then you could get a build-up of open requests to hundreds and that could trigger limiting. The safest way I've found is to limit how many requests you have in flight at the same time, not just how fast they are sent. I like to use Bluebird's `Promise.map()` for that functionality, but you can code it manually too if you want. – jfriend00 May 24 '17 at 20:46
  • 1
    Here's one scheme for controlling how many requests are "in-flight" at the same time: [Make several requests to an API that can only handle 20 requests at a time](https://stackoverflow.com/questions/33378923/make-several-requests-to-an-api-that-can-only-handle-20-request-a-minute/33379149#33379149) and [How to control how many promises access network in parallel](https://stackoverflow.com/questions/41028790/javascript-how-to-control-how-many-promises-access-network-in-parallel/41028877#41028877). – jfriend00 May 24 '17 at 20:54
  • 1
    Also see: [Making a million requests](https://stackoverflow.com/questions/34802539/node-js-socket-explanation/34802932#34802932). – jfriend00 May 24 '17 at 20:56
  • 1
    And, you might make sure your proxy doesn't have some sort of limit on it. – jfriend00 May 24 '17 at 20:56
  • @jfriend00 thanks so much for your valuable input, especially resources. I understand the problem, and I have a grasp for the solution. If you can provide an implementation the bluebird promise with my code (posted in Q), I'll mark it as correct – Moshe May 24 '17 at 21:15

1 Answers1

2

As discussed in comments, if you want to control the max number of requests that are in-flight at the same time, you can use Bluebird to do that fairly easily like this:

const Promise = require('bluebird');
const rp = require('request-promise');

router.get('/', (req, res) => {

    let collection = db.get().collection('foo');
    let collection2 = db.get().collection('test');
    collection.distinct('id', (err, idArr) => { // count: 3000+
        if (err) {
            // handle error here, send some error response
            res.status(501).send(...);
        } else {
            Promise.map(idArr, id => {
                let url = 'https://externalapi.io/id=' + id
                return rp(url).then(body => {
                    if (error) {
                        console.log(error)
                    } else {
                        let resp = JSON.parse(body);
                        // probably want to return a promise here too, but I'm unsure what DB you're using
                        collection2.insert(resp);
                    }
                }).catch(err => {
                    // decide what you want to do when a single request fails here
                    // by providing a catch handler that does not rethrow, other requests will continue
                });
                   // pick some concurrency value here that does not cause errors
            }, {concurrency: 10}).then(() => {
                // all requests are done, send final response
                res.send(...);
            }).catch(err => {
                // your code may never get here (depends upon earlier .catch() handler)
            });
        }
    });
});
jfriend00
  • 683,504
  • 96
  • 985
  • 979
  • here's a scenario: An Admin in the admin dashboard clicks a button that says "Update DB", when this button is clicked, this route gets triggered. Now, I can send a response immediately "ok", while all the work is being done in the background... how can I either get a confirmation response? i.e. "done" or just send a response to the client once all is done (waiting for this response can take >10 mins.) – Moshe May 24 '17 at 21:46
  • `// probably want to return a promise here too, but I'm unsure what DB you're using` < I'm using MongoDB, but which promise would I return here? – Moshe May 24 '17 at 21:47
  • 1
    @Moshe - Browsers don't really like to wait for 10 minutes for an HTTP response. It can probably be done with various tricks at both ends. One idea is to have the client connect a webSocket or socket.io connection and you give an immediate HTTP response and then pass progress on the webSocket/socket.io connection. – jfriend00 May 24 '17 at 21:48
  • Also, for clarity: this is a script meant to run "server side", all the data being processed is 100% for the server. The only response the client is interested in is the fact that the request is being processed.. – Moshe May 24 '17 at 21:49
  • 1
    You'd return a promise from the `.insert()` operation. That would synchronize things with your `.insert()` operation so you wouldn't be overloading your local DB and so if there were any errors there, you'd see them too. – jfriend00 May 24 '17 at 21:49
  • I've been working with ME4N since January, and I'm just learning all these awesome server side goodness. I don't understand how would I create/return a promise for the ".insert()" operation? I'm using MongoDB native driver (willing to change to Mongoose when I see the need). – Moshe May 24 '17 at 21:51
  • 1
    @Moshe - If you're using a version of MongoDB newer than 2.0.36, then promises are built in. You can just do `return collection2.insert(resp)` (if my read on the doc is correct). Example here: https://stackoverflow.com/a/38992795/816620. – jfriend00 May 24 '17 at 22:01
  • ` ReferenceError: error is not defined at rp.then.body ` – Moshe May 24 '17 at 22:17
  • 1
    @Moshe - I removed one extra `)` typo in my answer `return rp(url).then(body => { ...})`. – jfriend00 May 24 '17 at 22:26