0

I have a mongodb database with several collections: Collection1: holds records of books. Collection2: holds records of authors. Collection3: holds records of Owner.

Collection3 has approximately 500 MB.

I need to loop over the Owner collection and get the details about the book and author of each record in Collection3.

have the following api call:

app.get('/api/linkCollections/', function(req, res){
    functions.linkCollections(function(finished){
        if( finished )
            res.json(finished);
    });
});

in file named fucntions i have:

linkCollections: function(callback){
    var owners = Owner.find().cursor();
    owners.on('data', function(owner){

        Book.findOne({'bookName': owner.bookName}, function(err, book){
            if (err) {
                console.log(err);
            }else {
                if ( book !== null ) {
                    //do stuff to book
                }
            }
        });

        Author.findOne({'author': owner.bookAuthor}, function(err, author){
                if (err) {
                    console.log(err);
                }else {
                    if ( author !== null ) {
                        //do stuff to author
                    }
                }
        });
    }).on('error', function(err){
        console.log('error retrieving records');
    }).on('close', function(){
        callback(true);
    }); 
}

I then use fiddler to run this get request and it ends up getting the following error message:

[Fiddler] ReadResponse() failed: The server did not return a complete response for this request. Server returned 0 bytes.

How can I process each record of a very large collection without blocking the node event loop?

  • Of course. You simply are not waiting for each async operation to complete before iterating the next. Aside from that, [`$lookup`](https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/) is what you should be doing instead. And if you are still running a MongoDB that does not support it then it's really time to upgrade. And quite seriously since any version that does not support it is about to run out of official support. Server side joins have been around for a while now. – Neil Lunn Nov 03 '17 at 19:54
  • @NeilLunn Would you please elaborate? I do have a mongodb database version that supports the $lookup – JoaoFilipeClementeMartins Nov 03 '17 at 20:06
  • You could elaborate a bit yourself. What does "do stufff" mean? It makes a difference and a big one if there are even more async calls being hidden here. You're not actually returning any response in the function, so unless all you are doing is effectively "logging to console" then it seems safe to presume you may be writing back to another or same collection or doing something else we need be concerned about. Explain in your question, lest you don't get an answer you can actually apply. – Neil Lunn Nov 03 '17 at 20:22
  • @NeilLunn I was able to create an example of the lookup. You are right I need to tie records together from several collections produce a stitched record for all those that are valid and write those into another collection. – JoaoFilipeClementeMartins Nov 03 '17 at 20:27
  • Then you can probably also make use of [`$out`](https://docs.mongodb.com/manual/reference/operator/aggregation/out/) if you don't "need" to write the an existing collection, or could at least replace one with the new result. That way it all happens on the server. – Neil Lunn Nov 03 '17 at 20:30
  • I'm just going to put a hold on this to the general reference question then before the hungry piranhas all come running to post short answers saying "user `$lookup`", which won't really add anything new. – Neil Lunn Nov 03 '17 at 20:40
  • @NeilLunn thanks. Please do. I was able to use both $lookup and $out to produce the new collection with the aggregate result. Worked like a charm. Although, the browser or fiddler or whatever client is executing the get request never gets an answer back. Even when it eventually finishes. Is that as expected? what can be done to circumvent this? – JoaoFilipeClementeMartins Nov 03 '17 at 20:45
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/158192/discussion-between-joaofilipeclementemartins-and-neil-lunn). – JoaoFilipeClementeMartins Nov 03 '17 at 20:47
  • Show the code are now using if you must. You do need to return the callback still, and you do actually need to send a response from your server, at least indicating it's done. All you put here was `callback(true)`, but typically you would simply `.aggregate([],callback);` and in your calling method `someClass.linkCollections(function(err,result) { res.send(result) }` To be brief about it. – Neil Lunn Nov 03 '17 at 20:55
  • @NeilLunn Correct, I do return the callback has you posted here. But the aggregation takes a while so the response is actually never sent to the client. If I would want to do several aggregates what would be the best practice? – JoaoFilipeClementeMartins Nov 03 '17 at 21:00
  • 1
    Probably best to [ask a new question](https://stackoverflow.com/questions/ask) and be specific about what you are now doing and why, size of collection(s) etc. This question was basically presented as "how do I join" framed within not awaiting async callbacks. "Takes too long" is just far too abstract. – Neil Lunn Nov 03 '17 at 21:06

0 Answers0