Theres a reason populate()
cannot work
Using populate()
you won't be able to do this, and for a number of reasons. The main reason being that all populate()
is doing is essentially marrying up your foreign reference to results from another collection with given query parameters.
In fact with a $near
query, the results could be quite weird, since you might not receive enough "near" results to actually marry up with all the parent references.
There's a bit more detail about the "foreign constraint" limitation with populate()
in existing answers to Querying after populate in Mongoose and of course on the modern solution to this, which is $lookup
.
Using $lookup
and $geoNear
In fact, what you need is a $lookup
along with a $geoNear
, but you also must do the "join" the other way around to what you might expect. And thus from the Agency
model you would do:
Agency.aggregate([
// First find "near" agencies, and project a distance field
{ "$geoNear": {
"near": {
"type": "Point",
"coordinates": [ longitude , latitude ]
},
"distanceField": "distance",
"spherical" true
}},
// Then marry these up to Cars - which can be many
{ "$lookup": {
"from": Car.collection.name,
"let": { "agencyId": "$_id" },
"pipeline": [
{ "$match": {
"disponible": true,
"$expr": { "$eq": [ "$$agencyId", "$agency" ] }
}}
],
"as": "cars"
}},
// Unwinding denormalizes that "many"
{ "$unwind": "$cars" },
// Group is "inverting" the result
{ "$group": {
"_id": "$cars._id",
"car": { "$first": "$cars" },
"agency": {
"$first": {
"$arrayToObject": {
"$filter": {
"input": { "$objectToArray": "$$ROOT" },
"cond": { "$ne": [ "$$this.k", "cars" ] }
}
}
}
}
}},
// Sort by distance, nearest is least
{ "$sort": { "agency.distance": 1 } },
// Reformat to expected output
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": [ "$car", { "agency": "$agency" } ]
}
}}
])
As stated the $geoNear
part must come first. Bottom line is it basically needs to be the very first stage in an aggregation pipeline in order to use the mandatory index for such a query. Though it is true that given the form of $lookup
shown here you "could" actually use a $near
expression within the $lookup
pipeline
with a starting $match
stage, it won't return what you expect since basically the constraint is already on the matching _id
value. And it's really just the same problem populate()
has in that regard.
And of course though $geoNear
has a "query"
constraint, you cannot use $expr
within that option so this rules out that stage being used inside the $lookup
pipeline
again. And yes, still basically the same problem of conflicting constraints.
So this means you $geoNear
from your Agency
model instead. This pipeline stage has the additional thing it does which is it actually projects a "distanceField"
into the result documents. So a new field within the documents ( called "distance"
in the example ) will then indicate how far away from the queried point the matched document is. This is important for sorting later.
Of course you want this "joined" to the Car
, so you want to do a $lookup
. Note that since MongoDB has no knowledge of mongoose models the $lookup
pipeline stage expects the "from"
to be the actual collection name on the server. Mongoose models typically abstract this detail away from you ( though it's normally the plural of the model name, in lowercase ), but you can always access this from the .collection.name
property on the model as shown.
The other arguments are the "let"
in which you keep a reference to the _id
of the current Agency
document. This is used within the $expr
of the $match
in order to compare the local and foreign keys for the actual "joining" condition. The other constraints in the $match
further filter down the matching "cars" to those criteria as well.
Now it's probably likely there are in fact many cars to each agency and that is one basic reason the model has been done like this in separate collections. Regardless of whether it's one to one or one to many, the $lookup
result always produces an array. Basically we now want this array to "denormalize" and essentially "copy" the Agency
detail for each found Car
. This is where $unwind
comes in. An added benefit is that when you $unwind
the array of matching "cars", any empty array where the contraints did not match anything effectively removes the Agency
from the possible results altogether.
Of course this is the the wrong way around from how you actually want the results, as it's really just "one car" with "one agency". This is where $group
comes in and collects information "per car". Since this way around it is expected as "one to one", the $first
operator is used as an accumulator.
There is a fancy expression in there with $objectToArray
and $arrayToObject
, but really all that is doing is removing the "cars"
field from the "agency"
content, just as the "$first": "$cars"
is keeping that data separate.
Back to something closer to the desired output, the other main thing is to $sort
the results so the "nearest" results are the ones listed first, just as the initial goal was all along. This is where you actually use the "distance"
value which was added to the document in the original $geoNear
stage.
At this point you are nearly there, and all that is needed is to reform the document into the expected output shape. The final $replaceRoot
does this by taking the "car"
value from the earlier $group
output and promoting it to the top level object to return, and "merging" in the "agency"
field to appear as part of the Car
itself. Clearly $mergeObjects
does the actual "merging".
That's it. It does work, but you may have spotted the problem that you don't actually get to say "near to this AND with this other constraint" technically as part of a single query. And a funny thing about "nearest" results is they do have an in-buit "limit" on results they should return.
And that is basically in the next topic to discuss.
Changing the Model
Whilst all the above is fine, it's still not really perfect and has a few problems. The most notable problem should be that it's quite complex and that "joins" in general are not good for performance.
The other considerable flaw is that as you might have gathered from the "query"
parameter on the $geoNear
stage, you are not really getting the equivalent of both conditions ( find nearest agency to AND car has disponible: true ) since on separate collections the initial "near" does not consider the other constraint.
Nor can this even be done from the original order just as was intended, and again comes back to the problem with populate()
here.
So the real issue unfortunately is design. And it may be a difficult pill to swallow, but the current design which is extremely "relational" in nature is simply not a good fit for MongoDB in how it would handle this type of operation.
The core problem is the "join", and in order to make things work we basically need to get rid of it. And you do that in MongoDB design by embedding the document instead of keeping a reference in another collection:
const carSchema = new Schema({
name: { type: String, required: true},
agency: {
name: { type: String, required: true},
location: {
type: {
type: String,
enum: ['Point'],
default: 'Point'
},
coordinates: {
type: [Number],
required: true
}
}
}
}, { timestamps: true });
In short "MongoDB is NOT a relational database", and it also does not really "do joins" as the sort of itegral constraint over a join you are looking for simply is not supported.
Well, it's not supported by $lookup
and the ways it will do things, but the official line has been and will always be that a "real join" in MongoDB is embedded detail. Which simply means "if it's meant to be a constraint on queries you want to do, then it belongs in the same document".
With that redesign the query simply becomes:
Car.find({
disponible: true,
"agency.location": {
$near: {
$geometry: {
coordinates: [ latitude , longitude ]
},
}
}
})
YES, that would mean that you likely duplicate a lot of information about an "agency" since the same data would likely be present on many cars. But the facts are that for this type of query usage, this is actually what MongoDB is expecting you to model as.
Conclusion
So the real choices here come down to which case suits your needs:
Accept that you are possibly returning less than the expected results due to "double filtering" though the use of a $geoNear
and $lookup
combination. Noting that $geoNear
will only return 100 results by default, unless you change that. This can be an unreliable combination for "paged" results.
Restructure your data accepting the "duplication" of agency detail in order to get a proper "dual constraint" query since both criteria are in the same collection. It's more storage and maintenance, but it is more performant and completely reliable for "paged" results.
And of course if it's neither acceptable to use the aggregation approach shown or the restructure of data, then this can only show that MongoDB is probably not best suited to this type of problem, and you would be better off using an RDBMS where you decide you must keep normalized data as well as be able to query with both constraints in the same operation. Provided of course you can choose an RDBMS which actually supports the usage of such GeoSpatial queries along with "joins".