node.js javascript
I am performing large-scale data-processing via an async operation that performs hash lookups for a set of keys :
function dataProcess(keys, callback) {
// PERFORMANCE TRACKING VARIABLES
var iteratorCounter = 0;
var timestampInitial = Date.now();
var timestampLast = timestampInitial;
// PERFORMANCE TRACKING VARIABLES
async.mapLimit(keys, 100, iterator, function(err, result) {
if (err) {
callback(err, null);
} else {
var returnArray = result.filter(function(n) {
return n;
});
callback(null, returnArray);
};
});
function iterator(key, iteratorCallback) {
// lookup key in hash to retrieve a record ...
// if non-null , do some light data-processing and return it
// BEGIN PERFORMANCE TRACKING CODE
iteratorCounter++;
if (iteratorCounter % 1000 == 0) {
var timestampNow = Date.now();
var timeElapsedBatch = timestampNow - timestampLast;
var timeElapsedTotal = timwstampNow - timestampStart;
var timestampLast = timestampNow;
console.log('iterator count = ' + iteratorCounter);
console.log('elapsed time this batch = ' + timeElapsedBatch);
console.log('elapsed time total = ' + timeElapsedTotal);
}
// END PERFORMANCE TRACKING CODE
setImmediate(iteratorCallback, null, record);
}
});
I have added tracking code to enable performance-testing.
I am running perf tests on sets of 100K+ keys. Overall performance is unacceptable at ~10 mins for ~100K keys processed.
Additionally, I am noticing significant non-linear performance degradation near the end of processing. For most of the run, per-batch processing-time is relatively constant, varying between ~4-7 seconds per batch of 1K keys. However, the final batches of 1K keys follow an erratic pattern of slowing exponentially before tapering off : ~ 4 secs => ~0 secs -> ~72 secs => ~220 seconds => ~165 seconds => ~0 seconds.
My hunch is that the final performance degradation might in part be due to the async map operation having to push records into an array that has become quite large and costly to resize.
What optimizations are available ?