Mongo Sort by Count of Matches in Array

Question

Lets say my test data is

db.multiArr.insert({"ID" : "fruit1","Keys" : ["apple", "orange", "banana"]})
db.multiArr.insert({"ID" : "fruit2","Keys" : ["apple", "carrot", "banana"]})

to get individual fruit like carrot i do

db.multiArr.find({'Keys':{$in:['carrot']}})

when i do an or query for orange and banana, i see both the records fruit1 and then fruit2

db.multiArr.find({ $or: [{'Keys':{$in:['carrot']}}, {'Keys':{$in:['banana']}}]})

Result of the output should be fruit2 and then fruit1, because fruit2 has both carrot and banana

`i see both the records fruit1 and then fruit2` - then you say `should be fruit2 and then fruit1` - you're getting exactly what you want?! — Alex, May 25 '17 at 14:06
@Alex The question also said *" ... output should be fruit2 and then fruit1, because fruit2 has both carrot and banana"*. And that is the key that this is asking for a "weighted sort" of the results and not just returning both documents because they both matched. — Neil Lunn, May 25 '17 at 23:33

Neil Lunn · Accepted Answer · 2017-05-29T07:08:44.517

To actually answer this first, you need to "calculate" the number of matches to the given condition in order to "sort" the results to return with the preference to the most matches on top.

For this you need the aggregation framework, which is what you use for "calculation" and "manipulation" of data in MongoDB:

db.multiArr.aggregate([
  { "$match": { "Keys": { "$in": [ "carrot", "banana" ] } } },
  { "$project": {
    "ID": 1,
    "Keys": 1,
    "order": {
      "$size": {
        "$setIntersection": [ ["carrot", "banana"], "$Keys" ]
      }
    }
  }},
  { "$sort": { "order": -1 } }
])

On an MongoDB older than version 3, then you can do the longer form:

db.multiArr.aggregate([
  { "$match": { "Keys": { "$in": [ "carrot", "banana" ] } } },
  { "$unwind": "$Keys" },
  { "$group": {
    "_id": "$_id",
    "ID": { "$first": "$ID" },
    "Keys": { "$push": "$Keys" },
    "order": {
      "$sum": {
        { "$cond": [
          { "$or": [
           { "$eq": [ "$Keys", "carrot" ] },
           { "$eq": [ "$Keys", "banana" ] }
         ]},
         1,
         0
        ]}
      }
    }
  }},
  { "$sort": { "order": -1 } }
])

In either case the function here is to first match the possible documents to the conditions by providing a "list" of arguments with $in. Once the results are obtained you want to "count" the number of matching elements in the array to the "list" of possible values provided.

In the modern form the $setIntersection operator compares the two "lists" returning a new array that only contains the "unique" matching members. Since we want to know how many matches that was, we simply return the $size of that list.

In older versions, you pull apart the document array with $unwind in order to perform operations on it since older versions lacked the newer operators that worked with arrays without alteration. The process then looks at each value individually and if either expression in $or matches the possible values then the $cond ternary returns a value of 1 to the $sum accumulator, otherwise 0. The net result is the same "count of matches" as shown for the modern version.

The final thing is simply to $sort the results based on the "count of matches" that was returned so the most matches is on "top". This is is "descending order" and therefore you supply the -1 to indicate that.

Addendum concerning $in and arrays

You are misunderstanding a couple of things about MongoDB queries for starters. The $in operator is actually intended for a "list" of arguments like this:

{ "Keys": { "$in": [ "carrot", "banana" ] } }

Which is essentially the shorthand way of saying "Match either 'carrot' or 'banana' in the property 'Keys'". And could even be written in long form like this:

{ "$or": [{ "Keys": "carrot" }, { "Keys": "banana" }] }

Which really should lead you to if it were a "singular" match condition, then you simply supply the value to match to the property:

{ "Keys": "carrot" }

So that should cover the misconception that you use $in to match a property that is an array within a document. Rather the "reverse" case is the intended usage where instead you supply a "list of arguments" to match a given property, be that property an array or just a single value.

The MongoDB query engine makes no distinction between a single value or an array of values in an equality or similar operation.

Mongo Sort by Count of Matches in Array

1 Answers1

Addendum concerning $in and arrays

Linked

Related