0

I have a MongoDb collection called UserRecords. It stores all records for our users, where each user can have many records.

I am trying to calculate some basic statistics related to the number of records per user.

Specifically I would like the mean, median and mode of the number of records per user.

So far I have a query that groups all UserRecords by User_Id (uid) and counts the number of UserRecords for each user.

db.UserRecords.aggregate([
{$group: 
    {_id:{"uid":"$uid"},
    count:{$sum:1}}}
])

My query produces results that look like the following:

{ 
    "_id" : {
        "uid" : UUID("f22880a8-94d2-4524-a974-a2e500e2c2a2")
    }, 
    "count" : 100
}
{ 
    "_id" : {
        "uid" : UUID("1b3a3b81-d107-4345-8df5-a5ef00e23598")
    }, 
    "count" : 200
}

I would need my query to calculate the average of all the "count" values. For example, suppose the above results were the only 2 groups produced. I would need my query to do (100 + 200) / 2 = 150 and print that value of 150 to the console.

Does anyone know what I can add to my query to accomplish this?

*Edit, I would ideally like my result structure to be:

{
    "mean": 1000,
    "median": 850
    "mode": 900
}
DalhousieDuck
  • 329
  • 1
  • 16

2 Answers2

2

As mentioned in comments it might not be good practice to calculate median and mode in db level but just for enthusiasm try this.

Btw its calculating median and mode on the accumulated count of uids.

db.UserRecords.aggregate([
    {
        $group: {
            _id: { "uid": "$uid" },
            count: { $sum: 1 }
        }
    },
    {
        $group: {
            _id: null,
            mean: { $avg: "$count" },
            numbers: { $push: "$count" }
        }
    },
    {
        $addFields: {
            median: {
                $function: {
                    body: function(numbers) {
                        if (numbers.length === 0) return 0;
                        numbers.sort(function(a, b) { return a - b; });
                        var half = Math.floor(numbers.length / 2);
                        if (numbers.length % 2) return numbers[half];
                        return (numbers[half - 1] + numbers[half]) / 2.0;
                    },
                    args: ["$numbers"],
                    lang: "js"
                }
            },
            mode: {
                $function: {
                    body: function(numbers) {
                        return Object.values(
                            numbers.reduce((count, e) => {
                                if (!(e in count)) {
                                    count[e] = [0, e];
                                }

                                count[e][0]++;
                                return count;
                            }, {})
                        ).reduce((numbers, v) => v[0] < numbers[0] ? numbers : v, [0, null])[1];
                    },
                    args: ["$numbers"],
                    lang: "js"
                }
            }
        }
    },
    {
        $project: {
            _id: 0,
            numbers: 0
        }
    }
]);
Dheemanth Bhat
  • 4,269
  • 2
  • 21
  • 40
  • very impressive, nice answer! – DalhousieDuck Feb 28 '21 at 20:14
  • Thanks for asking this questions. I also learnt something new! – Dheemanth Bhat Feb 28 '21 at 20:28
  • 1
    I think another (maybe simpler) way of calculating mode is to just do 2 group stages. The first stage groups users by User_Id, then the second stage would group by the "count" field of the first stage. You could then sort the results in the descending order, and the first document would contain the mode. – DalhousieDuck Feb 28 '21 at 20:38
0

You can group with null as _id, in this case, the result is a single document with your accumulator expression applied to every document in the pipeline

According to MongoDB docs

If you specify an _id value of null, or any other constant value, the $group stage calculates accumulated values for all the input documents as a whole

db.UserRecords.aggregate([
    {
        $group: {
            _id: {"uid": "$uid"},
            count: {$sum: 1}
        },
    },
    {
        // will result in a single document which contains sumaries of
        // the previous groups data
        $group: {
            _id: null,

            recordsCount: {$sum: '$count'}, // the number of records in the collection
            usersCount: {$sum: 1}, // i.e "groups count"
        }
    },
    {
        $project: {
            mean: {$divide: ['$recordsCount', '$usersCount']}
            // ... you can add other measures here
        }
    }
])
  • 1
    `median` and `mode` will require a complex query, as sorting is required. In the second `$group` stage a field that holds sorted "count" is needed for those measures – Abdulhamid Zoubi Feb 28 '21 at 17:44