Handling large number of outbound HTTP requests

Question

I am building a feed reader application were I expect to have a large number of sources. I would request new data from each source in a given time interval (e.g., hourly) and then cache the response on my server. I am assuming requesting data from all sources at the same time is not the most optimal solution, as I will probably experience network congestion (I am curious to know if there would be any other bottlenecks too).

What would be an efficient way to perform such a large number of requests?

Thanks

If you issue all the requests at once then your service may not have the processing power to process them all and may not have the network bandwidth to deal with the influx of data. Just spread the calls out over an hour so that 2 requests to the same source are still an hour apart, but each call to a separate service is as long after the prior call as is possible. There are 3600 seconds in an hour, if you have 1800 sources then schedule them 2-seconds apart. Or, just run more nodes with more instances of your services. — Software Engineer, Jan 16 '21 at 19:43
Thanks, I also thought of that. I am just curious to know if there any other optimisations I could use. — Balázs Vincze, Jan 16 '21 at 19:52

jfriend00 · Accepted Answer · 2021-01-17T00:27:47.807

Since, there's no urgency to any given request and you just want to make sure you hit them each periodically, you can just space all the requests out in time.

For example, if you have N sources and you want to hit each one once an hour, you can just create a list of all the sources, and keep track of an index for which source is next. Then, calculate how far apart you can make each request and still get through them all in an hour.

So, if you had N requests to process once an hour:

let listOfSources = [...];
let nextSourceIndex = 0;

const cycleTime = 1000 * 60 * 60;    // an hour in ms
const delta = Math.round(cycleTime / listOfSources.length);

// create interval timer that cycles through the sources
setInterval(() => {
   let index = nextSourceIndex++;
   if (index >= listOfSources.length) {
       // wrap back to start
       index = 0;
       nextSourceIndex = 1;
   }
   processNextSource(listOfSources[index]);
}, delta);

function processNextSource(item) {
   // process this source
}

Note, if you have a lot of sources and it takes a little while to process each one, you may still have more than one source "in flight" at the same time, but that should be OK.

If the processing was really CPU or network heavy, you would have to keep an eye on whether you're getting bogged down and can't get through all the sources in an hour. If that was the case, depending upon the bottleneck issue, you may need either more bandwidth, faster storage or more CPUs applied to the project (perhaps using worker threads or child processes).

If the number of sources is dynamic or the time to process each is dynamic and you're anywhere near your processing limits, you could make this system adaptable so that if it was getting overly busy, it would just automatically space things out more than once an hour or vice versa, if things were not so busy it could visit them more frequently. This would require keeping track of some stats and calculating a new cycleTime variable and adjusting the timer each time through the cycle.

There are different types of approaches to. A common procedure when you have a large number of asynchronous operations to get through is to process them in a way that N of them are in-flight at any given time (where N is a relatively small number such as 3 to 10). This generally avoids overloading any local resources (such as memory usage, sockets in flight, bandwidth, etc...) while still allowing you to do some parallelism in the network aspect of things. This would be the type of approach you might use if you want to get through all of them as fast as possible without overwhelming local resources whereas the previous discussion is more about spacing them out in time.

Here's an implementation of a function called mapConcurrent() that iterates an array asynchronously with no more than N requests in flight at the same time. And, here's a function called rateMap() that is even more advanced in what type of concurrency controls it supports.

Handling large number of outbound HTTP requests

1 Answers1

Linked