I have an Item
collection which could hold thousands to hundreds of thousands of documents. On that collection, I want to perform Geospatial queries. Using Mongoose, there are two options - find()
and the Aggregation Pipeline. I have displayed my implementations of both below:
Mongoose Model
To start, here are the relevant properties of my Mongoose Model:
// Define the schema
const itemSchema = new mongoose.Schema({
// Firebase UID (in addition to the Mongo ObjectID)
owner: {
type: String,
required: true,
ref: 'User'
},
// ... Some more fields
numberOfViews: {
type: Number,
required: true,
default: 0
},
numberOfLikes: {
type: Number,
required: true,
default: 0
},
location: {
type: {
type: 'String',
default: 'Point',
required: true
},
coordinates: {
type: [Number],
required: true,
},
}
}, {
timestamps: true
});
// 2dsphere index
itemSchema.index({ "location": "2dsphere" });
// Create the model
const Item = mongoose.model('Item', itemSchema);
Find Query
// These variables are populated based on URL Query Parameters.
const match = {};
const sort = {};
// Query to make.
const query = {
location: {
$near: {
$maxDistance: parseInt(req.query.maxDistance),
$geometry: {
type: 'Point',
coordinates: [parseInt(req.query.lng), parseInt(req.query.lat)]
}
}
},
...match
};
// Pagination and Sorting
const options = {
limit: parseInt(req.query.limit),
skip: parseInt(req.query.skip),
sort
};
const items = await Item.find(query, undefined, options).lean().exec();
res.send(items);
Aggregation Pipeline
Suppose distance needed to be calculated:
// These variables are populated based on URL Query Parameters.
const query = {};
const sort = {};
const geoSpatialQuery = {
$geoNear: {
near: {
type: 'Point',
coordinates: [parseInt(req.query.lng), parseInt(req.query.lat)]
},
distanceField: "distance",
maxDistance: parseInt(req.query.maxDistance),
query,
spherical: true
}
};
const items = await Item.aggregate([
geoSpatialQuery,
{ $limit: parseInt(req.query.limit) },
{ $skip: parseInt(req.query.skip) },
{ $sort: { distance: -1, ...sort } }
]).exec();
res.send(items);
Edit - Example Documented Amended
Here is an example of a document with all of its properties from the Item
collection:
{
"_id":"5cd08927c19d1dd118d39a2b",
"imagePaths":{
"standard":{
"images":[
"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-aafe69c7-f93e-411e-b75d-319042068921-standard.jpg",
"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-397c95c6-fb10-4005-b511-692f991341fb-standard.jpg",
"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-e54db72e-7613-433d-8d9b-8d2347440204-standard.jpg",
"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-c767f54f-7d1e-4737-b0e7-c02ee5d8f1cf-standard.jpg"
],
"profile":"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-51318c32-38dc-44ac-aac3-c8cc46698cfa-standard-profile.jpg"
},
"thumbnail":"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-51318c32-38dc-44ac-aac3-c8cc46698cfa-thumbnail.jpg",
"medium":"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-51318c32-38dc-44ac-aac3-c8cc46698cfa-medium.jpg"
},
"location":{
"type":"Point",
"coordinates":[
-110.8571443,
35.4586858
]
},
"numberOfViews":0,
"numberOfLikes":0,
"monetarySellingAmount":9000,
"exchangeCategories":[
"Math"
],
"itemCategories":[
"Sports"
],
"title":"My title",
"itemDescription":"A description",
"exchangeRadius":10,
"owner":"zbYmcwsGhcU3LwROLWa4eC0RRgG3",
"reports":[],
"createdAt":"2019-05-06T19:21:13.217Z",
"updatedAt":"2019-05-06T19:21:13.217Z",
"__v":0
}
Questions
Based on the above, I wanted to ask a few questions.
Is there a performance difference between my implementations of the normal Mongoose Query and the use of the Aggregation Pipeline?
Is it correct to say that
near
andgeoNear
are pretty much similar tonearSphere
when using the2dsphere
index with GeoJSON - except thatgeoNear
provides extra data and default limiting? That is, although having different units, both queries - conceptually - would show relevant data within a specific radius from some location, despite the fact the field is calledradius
fornearSphere
andmaxDistance
withnear
/geoNear
.With my example above, how might the performance loss of using
skip
be mitigated but still be able to achieve pagination in both querying and aggregation?The
find()
function allows an optional parameter to determine which fields will be returned. The Aggregation Pipeline takes a$project
stage to do the same. Is there a specific order where$project
should be used in the pipeline to optimize speed/efficiency, or does it not matter?
I hope this style of question is permitted as per the Stack Overflow rules. Thank you.