3

I'm trying to work out exactly how to achieve an aggregation, I could manually unwind and group back together at the end, but I'm sure I should be able to achieve this in a more concise way so I wanted to throw it out as I'm getting stuck.

My document structure (skipping out the un-interesting bits) looks like:

{
    _id: ObjectId,
    panels: [
        {
            visConfig: {
                 dataConfig: {
                     columns: [
                         { element: "DX" },
                         { element: "SE" },
                     ]
                 }
            }
        },
        {
            visConfig: {
                 dataConfig: {
                     columns: [
                         { element: "AB" },
                         { element: "XY" },
                     ]
                 }
            }
        }
    ]
}

What I want to do is calculate a percentage of the element overlaps with a given set to be provided. So for example for the document shown it would produce 25% for the set ["DX"] or 50% for the set ["DX", "AB"].

So I've tried a few things, I think I've settled on the nearest so far as:

$project: {
   _id: 1,
   total: { $sum: { $size: "$panels.visConfig.dataConfig.columns" } }
}

But I'm getting an error here which I don't understand:

The argument to $size must be an array, but was of type: missing

Then I'm also having issues with my conditional aggregation which seems to be returning 0 for all of the element values.

{
  _id: 1,
  "panels.visConfig.dataConfig.columns.element": { 
      $sum: {
        $cond: [{
          $setIsSubset: [
            ["DX"], ["$panels.visConfig.dataConfig.columns.element"]
          ] 
        }, 1, 0 ],
      }
  },
}
Ian
  • 33,605
  • 26
  • 118
  • 198
  • what should be the output from above document? – Ashh Jul 31 '18 at 16:32
  • @AnthonyWinzlet I'm after something like `{ id: .... , match: 50 }` where `50` is some percentage value ideally. – Ian Jul 31 '18 at 16:35
  • Just to note for anyone else coming across this, the error I got with $size is addressed here https://stackoverflow.com/questions/24201120/mongodb-the-argument-to-size-must-be-an-array-but-was-of-type-eoo – Ian Jul 31 '18 at 21:19

3 Answers3

3

You can try below aggregation in 3.4 version.

db.colname.aggregate([
{"$project":{
  "_id":1,
  "total":{
    "$reduce":{
      "input":"$panels.visConfig.dataConfig.columns.element",
      "initialValue":0,
      "in":{"$add":["$$value",{"$size":"$$this"}]}
    }},
    "match":{
      "$sum":{
        "$map":{
          "input":"$panels.visConfig.dataConfig.columns.element",
          "in":{
            "$size":{
              "$setIntersection":[["DX","AB"],"$$this"]
            }
          }
        }
      }
    }
}},
{"$project":{
  "_id":1,
  "percent":{"$multiply":[{"$divide":["$match","$total"]}, 100]}
}}])

Update - You can perform both match and total calculations in $reduce pipeline.

db.colname.aggregate([
{"$project":{
  "_id":1,
  "stats":{
    "$reduce":{
      "input":"$panels.visConfig.dataConfig.columns.element",
      "initialValue":{"total":0,"match":0},
      "in":{
        "total":{"$add":["$$value.total",{"$size":"$$this"}]},
        "match":{"$add":["$$value.match",{"$sum":{"$map":{"input":"$$this","in":{"$cond":[{"$in":["$$this", ["DX","AB"]] }, 1, 0]}}}}]}

       }
    }}
}},
{"$project":{
  "_id":1,
  "percent":{"$multiply":[{"$divide":["$stats.match","$stats.total"]}, 100]}
}}])
s7vr
  • 73,656
  • 11
  • 106
  • 127
  • Thanks, I'll take a look at this tomorrow and compare against the other answer. Just reading it though made me wonder, could a `reduce` not also be done on `match` using a condition for the `input`? – Ian Jul 31 '18 at 19:51
  • Yw. I've added an update where I've calculated both match and total in the `$reduce` stage without the `$map` part. – s7vr Jul 31 '18 at 21:17
  • something I've noticed that I'm trying to work out, but maybe you can help is that it doesn't aggregate correctly if the same `columns` array features an element twice. e.g. `columns: [{ element: "DX" }, { element: "DX" }]` for a document will only count them once? – Ian Aug 01 '18 at 09:34
  • I didn't account for duplicates and I was processing all the elements under each config at once. Updated answer to process one element at a time i.e changed to use $map to iterate elements and compare against the input array. – s7vr Aug 01 '18 at 12:01
  • Thanks, I'm going to compare the before and after through the edit history to see the change. Then read some documentation to make sure I fully understand how this is working :) – Ian Aug 01 '18 at 12:38
  • Makes sense having gone through and documented it all - I feel like in MongoDB there are so many ways of trying to achieve the same thing, can be quite a challenge, especially if you head down the wrong path. I've posted a follow up question if you're interested about performance https://stackoverflow.com/questions/51637643/speeding-up-an-aggregation – Ian Aug 01 '18 at 15:59
1

You can use $map + $reduce to get an array of all element values and then using $divide you can divide $filter-ed $size by total $size:

db.col.aggregate([
    {
        $project: {
            elements: {
                $reduce: {
                    input: { 
                        $map: { 
                            input: "$panels", 
                            as: "panel", 
                            in: "$$panel.visConfig.dataConfig.columns.element" 
                        } 
                    },
                    initialValue: [],
                    in: { $concatArrays: [ "$$this", "$$value" ] }
                }
            }
        }
    },
    {
        $project: {
            percentage: {
                $divide: [ 
                    { 
                        $size: { 
                            $filter: { 
                                input: "$elements", 
                                as: "element", 
                                cond: { 
                                    $in: [ 
                                        "$$element", 
                                        [ "AB", "XY" ] // your input here
                                    ] 
                                } 
                            } 
                        } 
                    },
                    { $size: "$elements" }
                ]
            }
        }
    }
])
mickl
  • 48,568
  • 9
  • 60
  • 89
  • Thanks for the answer! I'm just trying it now, interestingly I'm getting an error `The argument to $size must be an array, but was of type: null` possibly because not all my documents have any `panels` or `columns`. – Ian Jul 31 '18 at 16:34
  • Additionally, there are a few things in there I'm going to have to go read about! Completely different to the approach I was trying to take, but looks quite promising. – Ian Jul 31 '18 at 16:34
  • Yes, you have to filter out all those null values, I guess you don't want any percentages for those docs so you can add $match as first pipeline stage, tried on your data set and looks fine, let me know if you need help with that filtering – mickl Jul 31 '18 at 16:36
0

Well, there are couple of ways to do this, but I these two pipelines show how I would do it.

var values = ["DX", "KL"]
  • First approach

    [
       {
          "$project": {
             "percent": {
                "$let": {
                   "vars": {
                      "allsets": {
                         "$reduce": {
                            "input": "$panels.visConfig.dataConfig.columns",
                            "initialValue": [],
                            "in": {
                               "$concatArrays": [ "$$this.element", "$$value" ]
                            }
                         }
                      }
                   },
                   "in": {
                      "$multiply": [
                         {
                            "$divide": [
                               {
                                  "$size": {
                                     "$setIntersection": [ "$$allsets", values ]
                                  }
                               },
                               { "$size": "$$allsets" }
                            ]
                         },
                         100
                      ]
                   }   
                 }
             }
          }
       }
    ]
    
  • Second approach same idea here but, using one pipeline stage.

    [
       {
          "$project": {
              "percent": {
                "$multiply": [
                   {
                      "$divide": [
                         {
                            "$sum": {
                               "$map": {
                                  "input": "$panels.visConfig.dataConfig.columns.element",
                                  "in": {
                                     "$size": {
                                       "$setIntersection": [ values, "$$this" ]
                                     }
                                  }
                               }
                            }
                         },
                         {
                            "$reduce": {
                               "input": "$panels.visConfig.dataConfig.columns.element",
                               "initialValue": 0,
                               "in": {
                                  "$add": [ "$$value", { "$size": "$$this" } ]
                               }
                            }
                         }
                      ]
                   },
                   100
                ]
             }
          }
       }
    ]
    
styvane
  • 59,869
  • 19
  • 150
  • 156