2

I'm trying to dynamically query a database that looks like this:

db.test.insert({
    "_id" : ObjectId("58e574a768afb6085ec3a388"),
    "place": "A",
    "tests" : [
        {
            "name" : "1",
            "thing" : "X",
            "evaluation" : [
                {
                    "_id": ObjectId("58f782fbbebac50d5b2ae558"),
                    "aHigh" : [1,2],
                    "aLow" : [ ],
                    "zHigh" : [ ],
                    "zLow" : [1,3]
                },
                {
                    "_id": ObjectId("58f78525bebac50d5b2ae5c9"),
                    "aHigh" : [1,4],
                    "aLow" : [2],
                    "zHigh" : [ 3],
                    "zLow" : [ ]
                },
                {
                    "_id": ObjectId("58f78695bebac50d5b2ae60e"),
                    "aHigh" : [ ],
                    "aLow" : [1,2,3],
                    "zHigh" : [1,2,3,4],
                    "zLow" : [ ]
                },]
            },
            {
            "name" : "1",
            "thing" : "Y",
            "evaluation" : [
                {
                    "_id": ObjectId("58f78c37bebac50d5b2ae704"),
                    "aHigh" : [1,3],
                    "aLow" : [4],
                    "zHigh" : [ ],
                    "zLow" : [3]
                },
                {
                    "_id": ObjectId("58f79159bebac50d5b2ae75c"),
                    "aHigh" : [1,3,4],
                    "aLow" : [2],
                    "zHigh" : [2],
                    "zLow" : [ ]
                },
                {
                    "_id": ObjectId("58f79487bebac50d5b2ae7f1"),
                    "aHigh" : [1,2,3],
                    "aLow" : [ ],
                    "zHigh" : [ ],
                    "zLow" : [1,2,3,4]
                },]
            }
            ]
        })
db.test.insert({
    "_id" : ObjectId("58eba09e51f7f631dd24aa1c"),
    "place": "B",
    "tests" : [
        {
            "name" : "2",
            "thing" : "Y",
            "evaluation" : [
                {
                    "_id": ObjectId("58f7879abebac50d5b2ae64f"),
                    "aHigh" : [2],
                    "aLow" : [3 ],
                    "zHigh" : [ ],
                    "zLow" : [1,2,3,4]
                },
                {
                    "_id": ObjectId("58f78ae1bebac50d5b2ae6db"),
                    "aHigh" : [ ],
                    "aLow" : [ ],
                    "zHigh" : [ ],
                    "zLow" : [3,4]
                },
                {
                    "_id": ObjectId("58f78ae1bebac50d5b2ae6dc"),
                    "aHigh" : [1,2],
                    "aLow" : [3,4],
                    "zHigh" : [ ],
                    "zLow" : [1,2,3,4]
                },]
            }
            ]
        })

In order to query the database, I have an object that is created by another part of my program. It comes in the form of:

var outputObject = {
    "top": {
        "place": [
        "A"
        ]
    },
    "testing": {
        "tests": {
            "name": [
                "1",
            ],
            "thing": [
                "X",
                "Y"
            ]
        }
    }
    }

I then use that outputObject and $match statements within the aggregate framework to execute the query. I have included two queries which do not seem to work.

db.test.aggregate([
        {$match: {outputObject.top}},
        {$unwind: '$tests'},
        {$match: {outputObject.testing}},
        {$unwind: '$tests.evaluation'},
        {$group: {_id: null, uniqueValues: {$addToSet: "$tests.evaluation._id"}}}
    ])

db.test.aggregate([
        {$match: {$and: [outputObject.top]}},
        {$unwind: '$tests'},
        {$match: {$and: [outputObject.testing]}},
        {$unwind: '$tests.evaluation'},
        {$group: {_id: null, uniqueValues: {$addToSet: "$tests.evaluation._id"}}}
    ])

However, this approach does not seem to be functioning. I have a couple questions:

  1. Do I need to modify the object outputObject before applying it to the $match statement?
  2. Are my queries correct?
  3. Should I be using $and or $in in combination with the $match statement?
  4. What code will produce the desired result?

Currently using mongoDB 3.4.4

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
black_sheep07
  • 2,308
  • 3
  • 26
  • 40

2 Answers2

2

It is better to agree on a fixed format for outputObject and write aggregation query accordingly.

You can now process the outputObject to inject the query operators and transform the keys to match the fields.

Something like below.

{
    "top": {
      "place": {
        "$in": [
          "A"
        ]
      }
    },
    "testing": {
      "tests.name": {
        "$in": [
          "1"
        ]
      },
      "tests.thing": {
        "$in": [
          "X",
          "Y"
        ]
      }
    }
  }

JS Code

var top = outputObject.top;
Object.keys(top).forEach(function(a) {
    top[a] = {
        "$in": top[a]
    };
});

var testing = outputObject.testing;
Object.keys(testing).forEach(function(a) {
    Object.keys(testing[a]).forEach(function(b) {
        var c = [a + "." + b];
        testing[c] = {
            "$in": testing[a][b]
        };
    })
    delete testing[a];
});

You can now use your aggregation query

db.test.aggregate([{
        $match: top
    },
    {
        $unwind: "$tests"
    },
    {
        $match: testing
    },
    {
        $unwind: "$tests.evaluation"
    },
    {
        $group: {
            _id: null,
            uniqueValues: {
                $addToSet: "$tests.evaluation._id"
            }
        }
    }
])

You can refactor your code to use below aggregation pipeline in 3.4

Process your output object ( includes $in operator ) to

{
  "top": {
    "place": {
      "$in": [
        "A"
      ]
    }
  },
  "testing": {
    "tests": {
      "name": [
        "1"
      ],
      "thing": [
        "X",
        "Y"
      ]
    }
  }
};

JS Code

var top = outputObject.top;
Object.keys(top).forEach(function(a) {top[a] = {"$in":top[a]};});

Aggregation:

[
  {
    "$match": top
  },
  {
    "$addFields": {
      "tests": {
        "$filter": {
          "input": "$$tests",
          "as": "res",
          "cond": {
            "$and": [
              {
                "$in": [
                  "$$res.name",
                  outputObject.testing.tests.name
                ]
              },
              {
                "$in": [
                  "$$res.thing",
                  outputObject.testing.tests.thing
                ]
              }
            ]
          }
        }
      }
    }
  },
  {
    "$unwind": "$tests.evaluation"
  },
  {
    "$group": {
      "_id": null,
      "uniqueValues": {
        "$addToSet": "$tests.evaluation._id"
      }
    }
  }
]  
s7vr
  • 73,656
  • 11
  • 106
  • 127
  • Unfortunately, the object is not fixed. Sometimes, it will be `var outputObject = { "top": { "place": [ "A" ] }, "testing": { "tests": { "name": [ "1" ] } } }` and there are significantly more components to the query. Manually adding them together is not something I can do. – black_sheep07 Jul 07 '17 at 15:31
  • You can dynamically create the whole pipeline. You can first process the output objects fields and add necessary query operators and then based on the output fields you can adjust your aggregation stages and assemble into aggregation pipeline.Here is one such example https://stackoverflow.com/questions/43889978/mongoose-how-to-write-a-query-with-if-condition/43959932#43959932 – s7vr Jul 07 '17 at 15:34
  • I'm looking at the example and it seems that I'm going to have to create one of those operators for every level of the query. For example, I'll have to create one for `testing.tests.name`, another for `testing.tests.thing`, and yet another for `top.place`. Is this accurate? – black_sheep07 Jul 07 '17 at 15:40
  • It really depends on aggregation query. The query I have provided you don't need operators for `testing` part but needs operator for matching query part. As long as you can agree on one fixed output field format you can adjust based on your query. – s7vr Jul 07 '17 at 15:42
  • I've updated answer to include javascript code to transform your `outputObject` for both your query and for provided answer. This should give you some idea as to how to deal with dynamic query. – s7vr Jul 07 '17 at 18:49
2

You have a couple of problems here. Firstly the array arguments in your input value should rather be compared with $in which many "any of these in the list" in order to match.

The second problem is that that since the paths are "nested" here you actually need to transform to "dot notation" otherwise you have another variant of the first problem where the conditions would be looking in the "test" array for elements that only have the supplied fields you specify in the input.

So unless you "dot notate" the path as well then since your array items also contain "evaluation" which is not supplied in the input, then it would not match as well.

The other issue here, but easily corrected is the "top" and "testing" separation here is not actually needed. Both conditions actually apply within "both" the $match stages in your pipeline. So you could in fact "flatten" that, as the example shows:

var outputObject = {
        "top" : {
                "place" : [
                        "A"
                ]
        },
        "testing" : {
                "tests" : {
                        "name" : [
                                "1"
                        ],
                        "thing" : [
                                "X",
                                "Y"
                        ]
                }
        }
};

function dotNotate(obj,target,prefix) {
  target = target || {},
  prefix = prefix || "";

  Object.keys(obj).forEach(function(key) {
    if ( Array.isArray( obj[key] ) ) {
      return target[prefix + key] = { "$in": obj[key] };
    } else if ( typeof(obj[key]) === "object" ) {
      dotNotate(obj[key],target,prefix + key + ".");
    } else {
      return target[prefix + key] = obj[key];
    }
  });

  return target;
}

// Run the transformation
var queryObject = dotNotate(Object.assign(outputObject.top,outputObject.testing));

This produces queryObject which now looks like:

{
    "place" : {
        "$in" : [ 
            "A"
        ]
    },
    "tests.name" : {
        "$in" : [ 
            "1"
        ]
    },
    "tests.thing" : {
        "$in" : [ 
            "X", 
            "Y"
        ]
    }
}

And then you can run the aggregation:

db.test.aggregate([
  { '$match': queryObject },
  { '$unwind': "$tests" },
  { '$match': queryObject },
  { '$unwind': "$tests.evaluation" },
  { '$group': {
    '_id': null,
    'uniqueValues': {
      '$addToSet': "$tests.evaluation._id"
    }
  }}
])

Which correctly filters the objects

{
    "_id" : null,
    "uniqueValues" : [ 
        ObjectId("58f79487bebac50d5b2ae7f1"), 
        ObjectId("58f79159bebac50d5b2ae75c"), 
        ObjectId("58f782fbbebac50d5b2ae558"), 
        ObjectId("58f78c37bebac50d5b2ae704"), 
        ObjectId("58f78525bebac50d5b2ae5c9"), 
        ObjectId("58f78695bebac50d5b2ae60e")
    ]
}

Please note that the conditions you supply here actually matches all documents and array entries you supplied in your question anyway. But it will of course actually remove anything that does not match.

Also ideally the "initial" query would rather use $elemMatch

{
    "place" : {
        "$in" : [ 
            "A"
        ]
    },
    "tests": {
      "$elemMatch": {
        "name" : { "$in" : [ "1" ] },
        "thing" : { "$in" : [ "X", "Y" ] }
      }
    }
}

Which would actually filter all of the documents properly in the initial query stage, since it would only select documents that actually had array elements which did in fact match "only" those conditions as opposed to the dot notated form in the "initial" query which would also return documents where the notated conditions for the "test" array were met in "any element" instead of "both conditions" on the element. But that may be another exercise to consider as the restructured query can apply to both the initial and "inner" filters without the $elemMatch.


Actually with thanks to this nice solution to a "Deep Object Merge" without additional library dependencies, you can use the $elemMatch like this:

var outputObject = {
        "top" : {
                "place" : [
                        "A"
                ]
        },
        "testing" : {
                "tests" : {
                        "name" : [
                                "1"
                        ],
                        "thing" : [
                                "X",
                                "Y"
                        ]
                }
        }
};

function dotNotate(obj,target,prefix) {
  target = target || {},
  prefix = prefix || "";

  Object.keys(obj).forEach(function(key) {
    if ( Array.isArray( obj[key] ) ) {
      return target[prefix + key] = { "$in": obj[key] };
    } else if ( typeof(obj[key]) === "object" ) {
      dotNotate(obj[key],target,prefix + key + ".");
    } else {
      return target[prefix + key] = obj[key];
    }
  });

  return target;
}

function isObject(item) {
  return (item && typeof item === 'object' && !Array.isArray(item));
}

function mergeDeep(target, ...sources) {
  if (!sources.length) return target;
  const source = sources.shift();

  if (isObject(target) && isObject(source)) {
    for (var key in source) {
      if (isObject(source[key])) {
        if (!target[key]) Object.assign(target, { [key]: {} });
        mergeDeep(target[key], source[key]);
      } else {
        Object.assign(target, { [key]: source[key] });
      }
    }
  }

  return mergeDeep(target, ...sources);
}

var queryObject = dotNotate(Object.assign(outputObject.top,outputObject.testing));

// Replace dot with $elemMatch
var initialQuery = Object.keys(queryObject).map( k => (
  ( k.split(/\./).length > 1 )
   ? { [k.split(/\./)[0]]: { "$elemMatch": { [k.split(/\./)[1]]: queryObject[k] } } }
   : { [k]: queryObject[k] }
)).reduce((acc,curr) => mergeDeep(acc,curr),{})

db.test.aggregate([
  { '$match': initialQuery },
  { '$unwind': "$tests" },
  { '$match': queryObject },
  { '$unwind': "$tests.evaluation" },
  { '$group': {
    '_id': null,
    'uniqueValues': {
      '$addToSet': "$tests.evaluation._id"
    }
  }}
])

With the pipeline being sent to the server as:

[
    {
        "$match" : {
            "place" : {
                "$in" : [ 
                    "A"
                ]
            },
            "tests" : {
                "$elemMatch" : {
                    "name" : {
                        "$in" : [ 
                            "1"
                        ]
                    },
                    "thing" : {
                        "$in" : [ 
                            "X", 
                            "Y"
                        ]
                    }
                }
            }
        }
    },
    {
        "$unwind" : "$tests"
    },
    {
        "$match" : {
            "place" : {
                "$in" : [ 
                    "A"
                ]
            },
            "tests.name" : {
                "$in" : [ 
                    "1"
                ]
            },
            "tests.thing" : {
                "$in" : [ 
                    "X", 
                    "Y"
                ]
            }
        }
    },
    {
        "$unwind" : "$tests.evaluation"
    },
    {
        "$group" : {
            "_id" : null,
            "uniqueValues" : {
                "$addToSet" : "$tests.evaluation._id"
            }
        }
    }
]

Also your $group is probably better written as:

{ "$group": { "_id": "$tests.evaluation._id" } }

Which returns "distinct" just like $addToSet does, but also puts the output into separate documents, instead of trying to combine into "one" which is probably not the best practice and could in extreme cases break the BSON limit of 16MB. So it is generally better to obtain "distinct" in that way instead.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • Using your solution works very well, except in the case where there are no values in the `top`. For example, when `var outputObject = { "testing" : { "tests" : { "name" : [ "1" ], "thing" : [ "X", "Y" ] } } };` the solution fails. Is there a way to make it function regardless of whether there is something in the `top`? – black_sheep07 Jul 12 '17 at 17:51
  • @black_sheep07 If you actually read the content I tell you twice that your presumption of needing separate "top" and "bottom" sections is incorrect and in fact you need the conditions to apply both initially and in later filtering. I also remember speaking to you before about [when you have a different question to what you ask then you ask it separately instead](https://stackoverflow.com/questions/ask). – Neil Lunn Jul 12 '17 at 22:18
  • 1
    @black_sheep07 In other words that's a "single statement in the code" `Object.assign(outputObject.top,outputObject.testing)` which is the only thing referencing the two keys which I am saying you should "get rid of" because there is no need to have them there at all. In fact what that statment does is "remove" those keys by "merging" the results from both keys as a single document. – Neil Lunn Jul 12 '17 at 22:30
  • I didn't originally understand what you meant by flattening it because you said that you used a flattened data set, but the `outputObject` used in your example is identical to the one I used. However, with your additional explanation, I get it. I can modify the structure of the incoming data set so that it is not dependent on `.top` existing. Thank you for all your help. – black_sheep07 Jul 12 '17 at 22:33