1

Trying to sort by two fields, giving priority to first - but instead seems like priority is given to the second (lastName), in the majority of cases. Occasionally, it works correctly - not sure what makes the change.

The same command works fine using Robo3t as client, connecting with the same database.

Running Flask and using Mongoengine, with the code below:

pipeline = [{"$sort": {"importance": -1, "lastName": 1}}, {"$match": {"visible": True, "editionId": "2017"}},
            {"$project": {'firstName': 1, 'lastName': 1, 'biography': 1, 'position': 1, 'workAt': 1, '_id': 0,
                          "imageUrl": 1}}]
doc = Speakers.objects.aggregate(*pipeline)

Same code (except 'True' becomes 'true'), in Robo3t:

db.getCollection('speakers').aggregate([{"$sort": {"importance": -1, "lastName": 1}}, {"$match": {"visible": true, "editionId": "2017"}},
            {"$project": {'firstName': 1, 'lastName': 1, 'biography': 1, 'position': 1, 'workAt': 1, '_id': 0,
                          "imageUrl": 1}}], {"collation": {"locale": "en"}}
      )

Indexes on mongodb (3.4.9) look as below:

> db.speakers.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "website_v3.speakers"
    }
]

Using db version v3.4.9

I had even tried creating an index (collation) for importance, but exactly the same happened. First it works, then it doesn't.

Any ideas welcome!

EDIT:

It seems like flask_mongoengine is messing with the order of $sort, as well as $project.

mongo
> db.system.profile.find().pretty()
{
    "op" : "query",
    "ns" : "website_v3.system.profile",
    "query" : {
        "find" : "system.profile",
        "filter" : {

        }
    },
    "keysExamined" : 0,
    "docsExamined" : 0,
    "cursorExhausted" : true,
    "numYield" : 0,
    "locks" : {
        "Global" : {
            "acquireCount" : {
                "r" : NumberLong(2)
            }
        },
        "Database" : {
            "acquireCount" : {
                "r" : NumberLong(1)
            }
        },
        "Collection" : {
            "acquireCount" : {
                "r" : NumberLong(1)
            }
        }
    },
    "nreturned" : 0,
    "responseLength" : 98,
    "protocol" : "op_command",
    "millis" : 0,
    "planSummary" : "COLLSCAN",
    "execStats" : {
        "stage" : "COLLSCAN",
        "nReturned" : 0,
        "executionTimeMillisEstimate" : 0,
        "works" : 2,
        "advanced" : 0,
        "needTime" : 1,
        "needYield" : 0,
        "saveState" : 0,
        "restoreState" : 0,
        "isEOF" : 1,
        "invalidates" : 0,
        "direction" : "forward",
        "docsExamined" : 0
    },
    "ts" : ISODate("2017-10-19T20:55:30.415Z"),

}
{
    "op" : "command",
    "ns" : "website_v3.speakers",
    "command" : {
        "aggregate" : "speakers",
        "pipeline" : [
            {
                "$match" : {
                    "visible" : true,
                    "editionId" : "2017"
                }
            },
            {
                "$sort" : {
                    "lastName" : 1,
                    "importance" : -1
                }
            },
            {
                "$project" : {
                    "firstName" : 1,
                    "position" : 1,
                    "biography" : 1,
                    "workAt" : 1,
                    "imageUrl" : 1,
                    "_id" : 0,
                    "lastName" : 1
                }
            }
        ],
        "cursor" : {

        }
    },
    "cursorid" : 168077246844,
    "keysExamined" : 0,
    "docsExamined" : 541,
    "hasSortStage" : true,
    "numYield" : 4,
    "locks" : {
        "Global" : {
            "acquireCount" : {
                "r" : NumberLong(16)
            }
        },
        "Database" : {
            "acquireCount" : {
                "r" : NumberLong(8)
            }
        },
        "Collection" : {
            "acquireCount" : {
                "r" : NumberLong(7)
            }
        }
    },
    "nreturned" : 101,
    "responseLength" : 157635,
    "protocol" : "op_query",
    "millis" : 2,
    "planSummary" : "COLLSCAN",
    "ts" : ISODate("2017-10-19T20:55:57.691Z"),
}
{
    "op" : "getmore",
    "ns" : "website_v3.speakers",
    "query" : {
        "getMore" : NumberLong("168077246844"),
        "collection" : "speakers"
    },
    "originatingCommand" : {
        "aggregate" : "speakers",
        "pipeline" : [
            {
                "$match" : {
                    "visible" : true,
                    "editionId" : "2017"
                }
            },
            {
                "$sort" : {
                    "lastName" : 1,
                    "importance" : -1
                }
            },
            {
                "$project" : {
                    "firstName" : 1,
                    "position" : 1,
                    "biography" : 1,
                    "workAt" : 1,
                    "imageUrl" : 1,
                    "_id" : 0,
                    "lastName" : 1
                }
            }
        ],
        "cursor" : {

        }
    },
    "cursorid" : 168077246844,
    "keysExamined" : 0,
    "docsExamined" : 0,
    "hasSortStage" : true,
    "cursorExhausted" : true,
    "numYield" : 0,
    "locks" : {
        "Global" : {
            "acquireCount" : {
                "r" : NumberLong(4)
            }
        },
        "Database" : {
            "acquireCount" : {
                "r" : NumberLong(2)
            }
        },
        "Collection" : {
            "acquireCount" : {
                "r" : NumberLong(2)
            }
        }
    },
    "nreturned" : 204,
    "responseLength" : 303603,
    "protocol" : "op_query",
    "millis" : 0,
    "planSummary" : "COLLSCAN",
    "ts" : ISODate("2017-10-19T20:55:57.702Z"),
}
gsapu
  • 11
  • 4

1 Answers1

0

MongoDB will execute the aggregation stages (in most cases) in the exact order you specify. The $match operator does not give any guarantee of the output order.

So you would want to put your $match at the start of your pipeline (followed by the $sort) which is a good idea for performance reasons anyway.

dnickless
  • 10,733
  • 1
  • 19
  • 34
  • I've tried to put `$match` then `$sort` as you suggested, but get same result. Items are sorted by `lastName` and not `importance`. The strange thing is that it works perfectly fine in Robo3T, either way. – gsapu Oct 18 '17 at 21:57
  • May I suggest you log your queries on the server end and compare them. If flask messes with the query that could explain it. If, however the exact same queries get logged using Robo3T and your app then the problem must be somewhere else: https://stackoverflow.com/questions/15204341/mongodb-logging-all-queries – dnickless Oct 19 '17 at 05:50
  • Other potential sources of confusion could be collation settings, mixed field types (e.g. int and string for importance), null values, superfluous space characters at the start of values or typos in field names. Lastly you might want to test alternative ways of sorting, e.g. just using `find()` or without the other two stages to isolate the problem. – dnickless Oct 19 '17 at 05:56
  • Plus, I wonder, how you detect the incorrect sorting - does anything in your app do something with the retrieved data that might change the order? – dnickless Oct 19 '17 at 05:57