3

I have a huge bunch of documents as such:

{
  _id: '1',
  colors: [
    { value: 'red', count: 2 },
    { value: 'blue', count: 3}
  ]
  shapes: [
    { value: 'cube', type: '3d' },
    { value: 'square', type: '2d'}
  ]
},    
{
  _id: '2',
  colors: [
    { value: 'red', count: 7 },
    { value: 'blue', count: 34},
    { value: 'yellow', count: 12}
  ]
  shapes: [
    { value: 'prism', type: '3d' },
    { value: 'triangle', type: '2d'}
  ]
}

By using $unwind and $addToSet, as such:

db.getCollection('coll').aggregate([{$unwind:"$colors"},{$unwind:"$shapes"},{$group:{_id:null,colors:{$addToSet:"$colors"},shapes:{$addToSet:"$shapes"}])

I can get the following:

{
    "_id" : null,
    "colors" : [ 
        { "value" : "red", "count" : 2 }, 
        { "value" : "blue", "count" : 3 }, 
        { "value" : "red", "count" : 7 }, 
        { "value" : "blue", "count" : 34 }, 
        { "value" : "yellow", "count" : 12 }
    ]
    "shapes" : [
        { value: 'cube', type: '3d' },
        { value: 'square', type: '2d'}
        { value: 'prism', type: '3d' },
        { value: 'triangle', type: '2d'}
    ]
}

What I want however is to judge duplicates sole by the field "value" and sum up the "count" field for duplicates, i.e.

{
    "_id" : null,
    "colors" : [ 
        { "value" : "red", "count" : 9 }, 
        { "value" : "blue", "count" : 37 },  
        { "value" : "yellow", "count" : 12 }
    ]
    "shapes" : [
        { value: 'cube', type: '3d' },
        { value: 'square', type: '2d'}
        { value: 'prism', type: '3d' },
        { value: 'triangle', type: '2d'}
    ]
}

This question suggests that I could use $colors.value as an _id field and $sum to total the count. However, as I have a second array to $unwind and aggregate/$group, I am unsure of the best way to go about doing this.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Tacocat
  • 1,134
  • 8
  • 23

2 Answers2

5

Try running the following aggregation pipeline:

pipeline = [
    {"$unwind": "$colors"},
    {
        "$group": {
            "_id": "$colors.value",
            "count": { "$sum": "$colors.count" },
            "shapes": { "$first": "$shapes" }
        }
    },
    {"$unwind": "$shapes"},
    {
        "$group": {
            "_id": null,
            "colors": { 
                "$addToSet": {
                    "value": "$_id",
                    "count": "$count"
                }
            },
            "shapes": { "$addToSet": "$shapes" }            
        }
    }
];
db.getCollection('coll').aggregate(pipeline)

Sample Output

{
    "result" : [ 
        {
            "_id" : null,
            "colors" : [ 
                {
                    "value" : "red",
                    "count" : 9
                }, 
                {
                    "value" : "blue",
                    "count" : 37
                }, 
                {
                    "value" : "yellow",
                    "count" : 12
                }
            ],
            "shapes" : [ 
                {
                    "value" : "square",
                    "type" : "2d"
                }, 
                {
                    "value" : "cube",
                    "type" : "3d"
                }, 
                {
                    "value" : "triangle",
                    "type" : "2d"
                }, 
                {
                    "value" : "prism",
                    "type" : "3d"
                }
            ]
        }
    ],
    "ok" : 1
}

Note that the count value for the document { value: 'yellow', count: '12'} is a string, in the aggregation it will be discounted as 0 value since the $sum operator efficiently aggregates numerical values, otherwise string values are accumulated to zero default.


Inside the $group pipeline, you are now grouping the flattened colors array documents by the $colors.value field and then using the accumulators to return the desired aggregation on the grouped documents. The accumulator operator $first is used in this grouping operation because it returns a value from the first document for each group when the documents are in a defined order, in this case you want to return the shapes field as it is when all documents are grouped. It's more of a trick to maintain the order of documents within the pipeline.

One thing to note here is when executing a pipeline, MongoDB pipes operators into each other. "Pipe" here takes the Linux meaning: the output of an operator becomes the input of the following operator. The result of each operator is a new collection of documents. So Mongo executes the previous pipeline as follows:

collection | $unwind | $group | $unwind | $group => result

Thus the $first is necessary to get the shapes fields from the previous pipeline to the next.

chridam
  • 100,957
  • 23
  • 236
  • 235
  • This looks good. Let me try it out! I have also edited my post to remove the erroneous apostrophes -_- Typos galore! Much thanks! :) @chridam – Tacocat Mar 01 '16 at 07:32
  • Erm, can you explain to me what this does?: `"shapes": { "$first": "$shapes" }` I might be functioning under a misunderstanding.. :3 – Tacocat Mar 01 '16 at 07:36
  • 1
    @Tacocat Added some explanation, hope you will understand it's use within the pipeline. Here it's used more of a mechanism to get the `shapes` fields from the previous pipeline into the grouping without affecting the current aggregation. – chridam Mar 01 '16 at 08:01
0

So, you would need to rewrite your insert, or convert the string to integer of the count value.

Here is to insert:

db.so.insert([{
  _id: '1',
  colors: [
    { value: 'red', count: 2 },
    { value: 'blue', count: 3}
  ]
},    
{
  _id: '2',
  colors: [
    { value: 'red', count: 7 },
    { value: 'blue', count: 34},
    { value: 'yellow', count: 12}
  ]
}]);

To do the conversion on an existing table, you can look at this SO.

And then you can do a simple two steps aggregate query:

db.so.aggregate(
  [
    {
      $unwind: "$colors"
    },
    {
      $group: {
       _id : { color : "$colors.value"},
       count : { $sum: "$colors.count"},
       simple : { $sum: 1}
      }
    }
  ]
);
Community
  • 1
  • 1
Nicolas Modrzyk
  • 13,961
  • 2
  • 36
  • 40
  • I have edited my post to get rid of the apostrophes. Those were typos. This does not actually answer my question, unfortunately. I am aware that it is possible for me to use `$colors.value` as an `_id`. However, the issue here is that I wish to preserve and aggregate my `shape`s field as well and I am not sure as to what is the best way to achieving this, @Nicolas Modrzyk – Tacocat Mar 01 '16 at 01:23