3

I'm pretty new with MongoDB, long time MySQL guy and am running into a little roadblock.

Given the below sample data:
TeamID Day OrderAmount
100      4/1   50
100      4/1   40
200      4/2   50
100      4/2   20

I'm trying to find the average team order amounts per day. I'm able to do a simple query using mapReduce function to do a group by with the TeamId & Day. So now I have:

TeamID      Day      AvgAmount
100           4/1      45
200           4/2      50
100           4/2      20

Now I'm trying to roll that data up to get the average order amount per team per day which would be:

Day      AvgAmount
4/1      47.5
4/2      35

I can do this easily with MySQL but am having trouble figuring out how to do this with MongoDB without doing it manually in the app side rather than doing it with MongoDB.

james
  • 491
  • 6
  • 15

2 Answers2

4

You can calculate these aggregates with either map-reduce, or the group() function. I'm using group() because it's a bit simpler and faster, however you should use map-reduce if you need to distribute the query over a sharded cluster.

First load the data:

db.orders.insert( { teamid: 100, date: "4/1", amount: 50 })
db.orders.insert( { teamid: 100, date: "4/1", amount: 40 })
db.orders.insert( { teamid: 200, date: "4/2", amount: 50 })
db.orders.insert( { teamid: 100, date: "4/2", amount: 20 })

Per team, per day:

db.orders.group({
    key: { teamid: true, date: true },
    reduce: function(doc, out) { out.sum += doc.amount; out.count++; },
    initial: { sum: 0, count: 0 },
    finalize: function(out) { out.average = out.sum / out.count }
});

To roll up daily aggregates, just change the key:

db.orders.group({
    key: { date: true },
    reduce: function(doc, out) { out.sum += doc.amount; out.count++; },
    initial: { sum: 0, count: 0 },
    finalize: function(out) { out.average = out.sum / out.count }
});
Chris Fulstow
  • 41,170
  • 10
  • 86
  • 110
  • Thanks Chris, but your last example wouldn't account for a team having 2 orders in the same day. It would count it twice with how you have it, which I am able to do. Is there a way to use some sort of hash data type to keep track of what TeamID has made an order that day already so that I don't count the team twice? – james May 21 '11 at 18:56
  • @james Could you post the equivalent SQL for the result you need? – Chris Fulstow May 21 '11 at 23:58
  • Hi Chris, thanks for taking a stab at it, see my answer below to what I was looking for. – james May 24 '11 at 00:28
0

After a little research, I was able to come up with a solution. I'm not satisfy that it's the most optimized solution due to being new to MongoDB and map/reduce thinking so if anyone else has something better, please correct me. Specifically, I wasn't able to get a length of arr_team object, so I had to have a counter that incremented.

reduce function:


    function(doc, prev) { 
      var retVal  = {team_count: 0, day_total: 0};

      if(!prev.arr_team[doc.team_id]) {
        prev.arr_team[doc.team_id] = 0;
        prev.team_count++;
      }

      prev.arr_team[doc.team_id]++;

      prev.order_count++; 
      if(doc.total_amount)
         prev.total_amount += doc.total_amount 

      return retVal;
    }

Finalize:


function(out) {
      out.avg_team_order_amount = out.total_amount/out.team_count;
    }
james
  • 491
  • 6
  • 15