2

Problem

I have 3 collections in MongoDB

  • regions
    • each document corresponds to one geographical region, has a field for the region name, and another field which is an array of farms within the broader region
  • details
    • this collection contains documents, each one relating to a particular farm, and with various fields relating to details of that farm, e.g. number of cows
  • yield
    • this collection again contains documents where each one relates to a particular farm, and the fields in this instance are for farm output per day

I am trying to write a function that will start with the regions collection, for the first region it then takes each individual farm ID, and uses this to query the other two collections getting the total yield and total number of cows for that farm, then summing for each farm to get a total for the region.

Attempt

I first tried using straightforward mongodb calls for just a single region

var db = client.connect('mongodb://localhost:27017/mydb', function(err,db) {
  if (err) throw err;

  var regions = db.collection('Regions');
  var details = db.collection('Details');
  var yield = db.collection('Yield');

regions.find({"region" : "Gotham"}).toArray(function(err, docs) {

  for (var k = 0; k < docs.length; k++) {
    var regionYield = 0;

    for (var j = 0; j < docs[k].farms.length; j++) {
      var farmYield = 0;
      var farmID = docs[k].farms[j]
      yield.find({Farm_ID: farmID}).toArray(function(err, docs) {

        for (var i = 0; i < docs.length; i++) {
          farmYield += +docs[i].Yield;
        }
        console.log('finished inner loop');

        regionYield += farmYield;
      });
    }
    console.log('finished middle loop');
  }
  console.log('finished outer loop');
});

After the completion of the outer loop I want to do something with the final regionYield value, but as it is structured now, the outer loop finishes before the necessary computation is finished in the inner loop, due to the async mongo call. I just can't figure out to get around this. I have looked at a lot of questions/answers here explaining callbacks but I just can't figure out how to apply it to my case.

Blakes Seven
  • 49,422
  • 14
  • 129
  • 135
Philip O'Brien
  • 4,146
  • 10
  • 46
  • 96
  • 2
    [This answer](http://stackoverflow.com/questions/13221262/handling-asynchronous-database-queries-in-node-js-and-mongodb/13222248#13222248) gives a good example of the two primary approaches to solving this problem. – JohnnyHK Sep 05 '15 at 16:24

3 Answers3

3

I think you are approaching this the wrong way in your design, but more on that later. First on to basically changing your listing.

The simple and no extra dependency approach would be to use the node stream interface that is directly supported by the driver. This allows you to process each document in turn and also does not load all the content into memory as with .toArray().

Also as a "stream" there is an "end" event that fires, and natuaral flow control that can wrap the queries issued:

var mongodb = require('mongodb'),
    MongoClient = mongodb.MongoClient;


MongoClient.connect('mongodb://localhost/mydb',function(err,db) {
  if (err) throw err;

  var regions = db.collection('Regions'),
      yield = db.collection('Yield');

  var resultHash = {};

  var rstream = regions.find({ "region": "Gotham" });

  rstream.on('err',function(err) {
    throw err;
  });

  rstream.on('end',function() {
    console.log( 'Complete' );
    console.log( JSON.stringify( resultHash, undefined, 2 ) );
    db.close();
  });

  rstream.on('data',function(region) {
    rstream.pause();                  // pause outer stream

    resultHash[region.region] = 0;

    var fstream = yield.find({ "Farm_ID": { "$in": region.farms } });

    fstream.on('err',function(err) {
      throw err;
    });

    fstream.on('end',function() {
      console.log('finished farms');
      rstream.resume();i              // resumes outer stream
    });

    fstream.on('data',function(farm) {
      fstream.pause();                // pause inner stream
      resultHash[region.region] += farm.Yield;
      fstream.resume();               // resume inner stream
    });

  });

});

And that is basically going to "sum up" all of your "Yield" values for the matched "region" of the other document. Also note the very simple use of $in to pass in all "farms" which is already in an array rather than processing another loop.

But you really should not be doing this anyway. Databases are "smart" and you design needs to be smarter.

You can avoid all of the trouble here by basically adding "region" data to your "yield" data. Then this is just a matter of running .aggregate():

So with data like this in "yield":

{ "region": "Gotham", "Yield": 123 }

Then run just this code:

yield.aggregate(
    [
        { "$group": {
            "_id": "$region",
            "yield": { "$sum": "$Yield" }
        }}
    ],
    function(err,results) {

    }
);

And it is all done, without loops or calculations in code.

So if you simply add the "related data" such as "region" to the "yield" data that you want to work with, then MongoDB already has the tools that make accumulating on that key a breeze.

This is what it means to move away from relational design. Things work differently, so you need to work differently with them. And smarter.

Blakes Seven
  • 49,422
  • 14
  • 129
  • 135
  • Excellent answer, thank you. You have really helped me to look at this from a different angle. Unfortunately my yield values are being stored as strings, so before I can utilise `$sum` I'll have to convert them, but this looks like a much better solution. For the moment I have progressed my own solution by using `async.parallel` with the use of closures but it is much more verbose than your solution. – Philip O'Brien Sep 08 '15 at 08:33
  • 1
    @robocode If you data for values you wish to "sum" is stored as "strings" then you should change that. Not to mention that I am basically saying change your data to suit the problem anyway. Cannot see what `async.parallel` would really add here other than processing multiple "regions" at once. But the **real** anwer here always was "change the way you store your data". Let the "database" aggregate things. You should not be pulling every document from your database and doing that work in code. – Blakes Seven Sep 08 '15 at 08:50
  • What an eye-opener. I just restructured my data and can now run those aggregate queries with ease! What you say makes so much sense, thanks. – Philip O'Brien Sep 08 '15 at 13:54
  • @robocode Well that is what you really needed to learn here. So much more simple when you structure according to how a NoSQL datastore expects to deal with it. Glad you learned something. Bit of a shame I did not catch the question earlier. – Blakes Seven Sep 08 '15 at 14:07
  • No shame at all, it made me go through the pain of attempting it the wrong way. Now I really value the ease of your solution. Lessons learned the hard way tend to last the longest – Philip O'Brien Sep 08 '15 at 14:41
  • @robocode Cool. Please don't make me go through the same pain I just had on the "last bounty answer" where the OP did not award and only because the vote count was enough that it was automatically awarded after expiry. When the answer is the answer, then just award the bounty. You don't get it back if not awarded. Sorry. And therefore the shame. – Blakes Seven Sep 08 '15 at 14:55
  • 1
    I did award the bounty – Philip O'Brien Sep 08 '15 at 14:56
0

You can use async library to make it easier to work with nested async calls.

Keito Aino
  • 86
  • 1
  • 5
0

Also You can use let instead of var in your loop.

var db = client.connect('mongodb://localhost:27017/mydb', function(err,db) {
  if (err) throw err;

  var regions = db.collection('Regions');
  var details = db.collection('Details');
  var yield = db.collection('Yield');

regions.find({"region" : "Gotham"}).toArray(function(err, docs) {

  for (let k = 0; k < docs.length; k++) {
    var regionYield = 0;

    for (let j = 0; j < docs[k].farms.length; j++) {
      var farmYield = 0;
      var farmID = docs[k].farms[j]
      yield.find({Farm_ID: farmID}).toArray(function(err, docs) {

        for (let i = 0; i < docs.length; i++) {
          farmYield += +docs[i].Yield;
        }
        console.log('finished inner loop');

        regionYield += farmYield;
      });
    }
    console.log('finished middle loop');
  }
  console.log('finished outer loop');
});

That will work fine..

Abhishek Parikh
  • 949
  • 1
  • 20
  • 41