1

i have code which somehow receives 100000 datasets. and then there is a storage that has to be accessed in a way that only once the last addition has been finished the next one can start.

in a syncronous way that would look like this .. so the add method would block .

var data = [...]; //100000 datasets 
var syncstorage = require( 'syncstorage' ); // syncronous storage.

for( var i = 0 ; i < data.length() ; i++ ) {
    syncstorage.add( data[i] ); // will only return once stored
}

the async storage does not block however it will tell you when it is done by a callback ...

/////// async storage 

asyncstorage.add( data[i] , function(err) { /* can only execute another add once i get this response */ } )

i only came up with this now :

var i = 0;
function execute() {
    if( i >= data.length()){
        return;
    }
    asyncstorage.add( data[i] , function(err) { i++; execute(); } )     
}

however it would result in an extreme callstack

maybe i would need an event emitter and emit it in that callback? kinda a resource wrapper? how is this resolved? ... i did sadly not find results in stack overflow concerning this specific issue;

ruled out solutions

  • async.each
  • async.series
    • http://caolan.github.io/async/docs.html#.series
      1. it requires an array of functions not of data
      1. it would require the callback to be called from the callback to not also be run quasiparallel
    • so it could be used but only with disproportionately memory requirements (functions)
    • also it is not sure how async handles this internally (call stack wise)

test example

var store = {add:function(d,cb){cb(null)}};
var d=[]; for(var i = 0 ; i < 100000; i ++) { d.push(i)}; d;
var async = require('async');
async.eachSeries(d,store.add);

does not work! this is, because async assumes that there will be an event emitter present in the iteratee function. therefore a simple test class like the above runs in a Maximum call stack size exceeded

Community
  • 1
  • 1
Summer-Sky
  • 463
  • 8
  • 21

1 Answers1

1

Use Promises or Async

var async = require('async');

// Assuming that asyncstorage.add = function(chunk, callback) { ... }
async.eachSeries(data, asyncstorage.add, function(err) { 
    if (err)
        console.log(err);
    ...
});

The decision to remove Maximum call stack size exceeded is call nextTick. It's "give node.js the chance to clear the stack (read more)".

Improved code
Async

var async = require('async');

var store = {
    add: function(chunk, cb){
        res.push(chunk);
        cb(null);
    }
};
var data = []; 
for (var i = 0 ; i < 100000; i ++)  
    data.push(i);

var res = []; // for test result

async.eachSeries(data, 
    // on each iteration.
    function f(chunk, cb) {
        async.nextTick(function() {
            store.add(chunk, cb)
        });
    }, 
    // on done
    function(err) {
        console.log((err) ? err : ('done ' + res.length));
    }
);

Event Emmiter

var data = []; 
for (var i = 0; i < 100500; i++) 
    data.push(i);

var store = {
    add: function (chunk, cb) { cb(null); }
};

var EventEmitter = require('events').EventEmitter;
var e = new EventEmitter;

e.on('next', function(i) {
    if (i > data.length) 
        return console.log(i, 'done'); 

    setImmediate(function() { // clear stack
        store.add(data[i], () => e.emit('next', i + 1))
    });
})

e.emit('next', 0);
Community
  • 1
  • 1
Aikon Mogwai
  • 4,954
  • 2
  • 18
  • 31
  • "Note, that since this function applies iteratee to each item in parallel, there is no guarantee that the iteratee functions will complete in order." also ther is no gurantee that the previous function has completed before the next call. http://caolan.github.io/async/docs.html#.each – Summer-Sky Jul 28 '16 at 08:14
  • series would work but very unelegantly ... 1. because it is a task stack not a data stack (functions) .... 2. the add function completes fast since it is async and since that it will "each one running once the previous function has completed." be executed in quasi parallel too. not waiting for the async add to finish the callback .. so i would have to fill up 100k of functions in an array with this stuff .. please edit your answer if you mean something different than each – Summer-Sky Jul 28 '16 at 08:33
  • @Summer-Sky: `async` is really efficient, so performance-wise, even if not optimal, it will be better than your current solution. – DrakaSAN Jul 28 '16 at 09:06
  • @Summer-Sky: What do you mean by "It s a task stack, not a data stack"? For point 2, it will wait until the callback from the iterator has completed, which mean it will wait for the `asyncstorage.add` to callback, which should mean the add is done and completed before a second one is started. – DrakaSAN Jul 28 '16 at 09:18
  • @DrakSAN the functions require memory ... each one... see V8 for that ... see series documentation to see that you put an array of functions in that not an array of data (e.g. integers, strings ... ) – Summer-Sky Jul 28 '16 at 09:30
  • @ DrakaSAN currently there is no solution i have given except the outline for the eventemitter... – Summer-Sky Jul 28 '16 at 09:32
  • Heh, I say `Series`, but I mean `eachSeries` :) – Aikon Mogwai Jul 28 '16 at 09:38
  • Changed original answer. – Aikon Mogwai Jul 28 '16 at 10:28
  • @Aikon Mogwai good work +1 but not complete yet .. see my example – Summer-Sky Jul 28 '16 at 11:15
  • 1
    See my answer again. – Aikon Mogwai Jul 28 '16 at 13:13