2

The question is the next one:

Get documents with tags in list, ordered by total number of matches

But they say that is possible using Aggregation Framework, it's possible?

Community
  • 1
  • 1
Wiliam
  • 3,714
  • 7
  • 36
  • 56

2 Answers2

8

Yes, it's possible using Aggregation Framework.

Assumptions

Query

This approach forces you to unwind the results and reevaluate the match predicate with unwinded results, so its really inefficient.

db.test_col.aggregate(
    {$match: {tags: {$in: ["shirt","cotton","black"]}}}, 
    {$unwind: "$tags"}, 
    {$match: {tags: {$in: ["shirt","cotton","black"]}}}, 
    {$group: {
        _id:{"_id":1}, 
        matches:{$sum:1}
    }}, 
    {$sort:{matches:-1}}
);

Expected Results

{
    "result" : [
        {
            "_id" : {
                "_id" : ObjectId("5051f1786a64bd2c54918b26")
            },
            "matches" : 3
        },
        {
            "_id" : {
                "_id" : ObjectId("5051f1726a64bd2c54918b24")
            },
            "matches" : 2
        },
        {
            "_id" : {
                "_id" : ObjectId("5051f1756a64bd2c54918b25")
            },
            "matches" : 1
        }
    ],
    "ok" : 1
}
Community
  • 1
  • 1
Samuel García
  • 2,199
  • 14
  • 21
  • Inefficient? I'm making a suggest, I need it to be as fast as possible, do you think that this can be done in a better way? – Wiliam Sep 13 '12 at 15:04
  • Nope, there is no better way, using this schema, to do it. It is really much faster than MR approach. Note that if you have millons of (matching) documents, this unwinds it creating, in memory, millions*avg_tags_size. Aggregation Framework have memory usage limits. Use it with caution. – Samuel García Sep 13 '12 at 15:12
  • 1
    I tried it, I'm using v2.2, so I had to change "_id:{"_id":1}" for "_id:{"_id":"$_id"}" and removed second $match. It worked fine, but is not fast enough, thanks :) (PHP CODE: https://gist.github.com/41d31bccd0cc3814fdda) – Wiliam Sep 13 '12 at 22:53
  • Yes, i used 2.1 (dev version) syntax. In MongoDB 2.2 group _id declaration had changed. I edited according your comment. – Samuel García Sep 14 '12 at 07:09
  • Did you ensure index on "tags" attr? How much docs you have on your collection? – Samuel García Sep 14 '12 at 07:23
  • I tested this again and the second match is needed. If you have a match document with tags:["shirt", "house", "dog", "car", "sky"] without the second match clause, this will be sorted first and only one tag really matches. – Samuel García Sep 14 '12 at 12:23
  • The document has 120.000 documents with an average of 3-5 tags per document. Tags has a index. It takes ~5s and with normal find and $all operator takes <1ms. – Wiliam Sep 14 '12 at 13:58
  • How much docs your query matches? It's normal it takes more time since second match dont use any índex at all. Group algorithm isnt fast neither. – Samuel García Sep 14 '12 at 14:54
  • 2 or 3 with regex, but with this example I tried exact match. Well, don't worry, you answered my question, thanks. About the index, I asked that in another question, I solved it in client side so no problem :) http://stackoverflow.com/questions/12416173/optimize-array-query-match-with-operator-all-in-mongodb – Wiliam Sep 15 '12 at 08:56
  • nicely done. loved the idea. used it someplace else. thanks – Visakh Vijayan Jun 21 '21 at 12:25
0

Using $size and $setIntersection will solve this efficiently, without causing memory multiplication.

tagList = ['shirt', 'cotton', 'black']

db.test_col.aggregate(
    {$match: {tags: {$in: ["shirt","cotton","black"]}}}, 
    {$project: 
        {"title":1,"tags":1}, 
        {$order: 
            {"$size": 
                {"$setIntersection": [ tagList, "$tags" ]}}, 
    {$sort:{order:-1}}
    );
  1. First we match the documents which have at least one element matching.

  2. Then we project Keys/Columns we need along with a new order key/column. Order is generated by taking the count of intersected elements between 'tags in db' and 'tags from query'.

  3. Then we do a simple sort in descending order. This worked for me. Similar Question answered here

Community
  • 1
  • 1