2

I have a Node.js RESTful API returning JSON data. One of the API calls can (and frequently does) take 10 - 20 seconds to finish. This long RTT is due to connecting to external APIs, like DiffBot, MailChimp, Facebook, Twitter, etc. I wish I could make the API call shorter, but I cannot.

Of course, I've implemented the node code in a nice async way, but the problem is that the client's inbound connection (to the node app) is alive while it waits for the server to finish, and thus might be killing my performance. In fact, I'm currently guessing that this may explain my long-running timeout issue in node.

I've already increased maxSockets to a huge number...

require('http').globalAgent.maxSockets = 9999;

For the sake of interest, I'm printing out the active sockets each time a new connection is made (here's the code).

Which gives me output like this:

SOCKETS: {} { 'graph.facebook.com:443': 5, 'api.instagram.com:443': 1 }

Nothing too enlightening there. The max connections I ever see is around 20 or so, total, across all hosts. But this doesn't really tell me anything about incoming connections, or how to optimize them so that my server does not choke when there are many of them alive at once (which I suspect it is).

Community
  • 1
  • 1
Zane Claes
  • 14,732
  • 15
  • 74
  • 131

1 Answers1

0

You should optimize your architecture, not just the code.

First, I would change the way the client/server interact with each other. The server should end the request upon recept and notify the client once all the tasks for that request are truly complete.

There are different ways to achieve that. For example, the client can query the stats of the request using AJAX (poll) every X seconds. Another example would be to use WebSocket.

If you're going with this approach, look into Socket.IO. It supports many transports with the same API, if WebSocket is available, it would use that, otherwise, it would fall back to other transports such as Flash Socket, long-polling, etc.

Second, you shouldn't use one process to do all this work. You should use a queue (preferably a messaging system that supports queues), then, run workers (separate processes) to do the "heavy lifting".

Personally, I use AMQP due to its features and portability (it's an open-standard) but feel free to use any other queue system with a persistant backend.

That way, if one or more process(es) crash(es) and you use the right queue, you wouldn't lose any data (such as the API tasks you mentioned).

Hope it helps.

samitny
  • 368
  • 1
  • 6