From my understanding, wouldn't it only ever have up to user_words.length possible connections open at a time, since those are the requests that are being done while the Promise.all waits to resolve? I don't see how the connections are not being closed.
No, this is not correct. You have two nested for
loops so you can have as many as user_words.length
* how many subreddits there are
all open at the same time. Remember, rp()
and Promise.all()
do not block so you run the nested for
loops to completion launching every single connection before any of the responses are processed.
It also looks like you're expecting to somehow synchronously return a result with the line of code return top_subreddit
. You can't do that either. You should return a promise that will eventually resolve to the desired result.
From my understanding, wouldn't it only ever have up to user_words.length possible connections open at a time, since those are the requests that are being done while the Promise.all waits to resolve? I don't see how the connections are not being closed.
This is not the right understanding of Promise.all()
. Promise.all()
is not blocking. It doesn't "wait" until all the promises are resolved before your code continues to exit. It behaves asynchronously. Your code continues to execute other iterations of your for
loop and Promise.all()
calls it's .then()
handler sometime in the future when all the promises you passed it have finished. The other iterations of your for
loop continue to run and pile up more sockets.
I think the easiest way to approach this is the make an array of URLs that you want to process and then use one of the asynchronous libraries that already has a function built-in to allow you to run a max of N async operations in-flight at the same time. Since your code is promise-based, I would choose Bluebird's Promise.map()
to process the list of URLs.
var Promise = require('bluebird');
function getBestSubreddit(messageText) {
var user_words = parse_message(messageText);
var top_subreddit = "";
var top_score = Number.MIN_SAFE_INTEGER;
return rp(dbUrl + '/.json?shallow=true').then(function(res) {
res = JSON.parse(res);
// build a list of URLs to process
var urls = [];
for (var subreddit in res) {
if (res.hasOwnProperty(subreddit)) {
for (var i = 0; i < user_words.length; i++) {
urls.push(dbUrl + '/' + subreddit + '/word_freqs/' + user_words[i] + '.json');
}
}
}
//
return Promise.map(urls, function(url) {
return rp(url);
}, {concurrency: 20}).then(function(allResults) {
// do any final processing of allResults here and return that value
// to become the resolved result of the returned promise
});
}
}
getBestSubreddit(someText).then(function(result) {
// process result here
}).catch(function(err) {
// handle error here
});
I set the number of concurrent requests to 20 in this example. You can experiment whether changing that to a higher or lower number improves your throughput. The ideal number is dependent upon a number of things including your local execution environment, the amount of data you are requesting and the bandwidth you have and on the target host you are making requests from and how it handles simultaneous requests. You may also need to be concerned about rate limiting by the target if you make too many requests too quickly.
Some other related answers:
How to make millions of parallel http requests from nodejs app?
In Node js. How many simultaneous requests can I send with the "request" package
Making a million requests
It still is not clear to me from your question exactly what result you're trying to get, but here's a version that collects all the data possible. You end up with an array of objects of this form {result: result, subreddit: subreddit, word: word}
where result
is the result of your rp()
for a given subreddit and a given word. You can then collate that set of results however you want:
var Promise = require('bluebird');
function getBestSubreddit(messageText) {
var user_words = parse_message(messageText);
var top_subreddit = "";
var top_score = Number.MIN_SAFE_INTEGER;
return rp(dbUrl + '/.json?shallow=true').then(function(res) {
res = JSON.parse(res);
// build a list of URLs to process
var requestData = [];
for (var subreddit in res) {
if (res.hasOwnProperty(subreddit)) {
for (var i = 0; i < user_words.length; i++) {
requestData.push({url:dbUrl + '/' + subreddit + '/word_freqs/' + user_words[i] + '.json', subreddit: subreddit, word: user_words[i]});
}
}
}
//
return Promise.map(requestData, function(url) {
return rp(requestData.url).then(function(result) {
return {result: result, subreddit: requestData.subreddit, word: requestData.word};
});
}, {concurrency: 20}).then(function(allResults) {
// now filter through all the data with appropriate subreddit
// allResults is an array of objects of this form {result: result, subreddit: subreddit, word: word}
// return whatever you want the final result to be after processing the allResults array
});
}
}
getBestSubreddit(someText).then(function(result) {
// process result here
}).catch(function(err) {
// handle error here
});