1

I have an Item collection which could hold thousands to hundreds of thousands of documents. On that collection, I want to perform Geospatial queries. Using Mongoose, there are two options - find() and the Aggregation Pipeline. I have displayed my implementations of both below:

Mongoose Model

To start, here are the relevant properties of my Mongoose Model:

// Define the schema
const itemSchema = new mongoose.Schema({
    // Firebase UID (in addition to the Mongo ObjectID)
    owner: {
        type: String,
        required: true,
        ref: 'User'
    },
    // ... Some more fields
    numberOfViews: {
        type: Number,
        required: true,
        default: 0
    },
    numberOfLikes: {
        type: Number,
        required: true, 
        default: 0
    },
    location: {
        type: {
            type: 'String',
            default: 'Point',
            required: true
        },
        coordinates: {
            type: [Number],
            required: true,
        },
    }
}, {
    timestamps: true
});

// 2dsphere index
itemSchema.index({ "location": "2dsphere" });

// Create the model
const Item = mongoose.model('Item', itemSchema);

Find Query

// These variables are populated based on URL Query Parameters.
const match = {};
const sort = {};

// Query to make.
const query = {
    location: {
        $near: {
            $maxDistance: parseInt(req.query.maxDistance),
            $geometry: {
                type: 'Point',
                coordinates: [parseInt(req.query.lng), parseInt(req.query.lat)]
            }
        }
    },
    ...match
};

// Pagination and Sorting
const options = {
    limit: parseInt(req.query.limit),
    skip: parseInt(req.query.skip),
    sort
};

const items = await Item.find(query, undefined, options).lean().exec();

res.send(items);

Aggregation Pipeline

Suppose distance needed to be calculated:

// These variables are populated based on URL Query Parameters.
const query = {};
const sort = {};

const geoSpatialQuery = {
    $geoNear: {
        near: { 
            type: 'Point', 
            coordinates: [parseInt(req.query.lng), parseInt(req.query.lat)] 
        },
        distanceField: "distance",
        maxDistance: parseInt(req.query.maxDistance),
        query,
        spherical: true
    }
};

const items = await Item.aggregate([
    geoSpatialQuery,
    { $limit: parseInt(req.query.limit) },
    { $skip: parseInt(req.query.skip) },
    { $sort: { distance: -1, ...sort } } 
]).exec();

res.send(items);

Edit - Example Documented Amended

Here is an example of a document with all of its properties from the Item collection:

{
   "_id":"5cd08927c19d1dd118d39a2b",
   "imagePaths":{
      "standard":{
         "images":[
            "users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-aafe69c7-f93e-411e-b75d-319042068921-standard.jpg",
            "users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-397c95c6-fb10-4005-b511-692f991341fb-standard.jpg",
            "users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-e54db72e-7613-433d-8d9b-8d2347440204-standard.jpg",
            "users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-c767f54f-7d1e-4737-b0e7-c02ee5d8f1cf-standard.jpg"
         ],
         "profile":"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-51318c32-38dc-44ac-aac3-c8cc46698cfa-standard-profile.jpg"
      },
      "thumbnail":"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-51318c32-38dc-44ac-aac3-c8cc46698cfa-thumbnail.jpg",
      "medium":"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-51318c32-38dc-44ac-aac3-c8cc46698cfa-medium.jpg"
   },
   "location":{
      "type":"Point",
      "coordinates":[
         -110.8571443,
         35.4586858
      ]
   },
   "numberOfViews":0,
   "numberOfLikes":0,
   "monetarySellingAmount":9000,
   "exchangeCategories":[
      "Math"
    ],
   "itemCategories":[
      "Sports"
   ],
   "title":"My title",
   "itemDescription":"A description",
   "exchangeRadius":10,
   "owner":"zbYmcwsGhcU3LwROLWa4eC0RRgG3",
   "reports":[],
   "createdAt":"2019-05-06T19:21:13.217Z",
   "updatedAt":"2019-05-06T19:21:13.217Z",
   "__v":0
}

Questions

Based on the above, I wanted to ask a few questions.

  1. Is there a performance difference between my implementations of the normal Mongoose Query and the use of the Aggregation Pipeline?

  2. Is it correct to say that near and geoNear are pretty much similar to nearSphere when using the 2dsphere index with GeoJSON - except that geoNear provides extra data and default limiting? That is, although having different units, both queries - conceptually - would show relevant data within a specific radius from some location, despite the fact the field is called radius for nearSphere and maxDistance with near/geoNear.

  3. With my example above, how might the performance loss of using skip be mitigated but still be able to achieve pagination in both querying and aggregation?

  4. The find() function allows an optional parameter to determine which fields will be returned. The Aggregation Pipeline takes a $project stage to do the same. Is there a specific order where $project should be used in the pipeline to optimize speed/efficiency, or does it not matter?

I hope this style of question is permitted as per the Stack Overflow rules. Thank you.

  • Can you share a sample document of your collection please. Aggregation pipeline is kind of data flow so data is filtered through the pipeline so its depend on your requiremnt. – Sheshan Gamage May 08 '19 at 09:51
  • @SheshanGamage Thank you for your comment. Please see the updated/edit to view the contents of an example document. –  May 08 '19 at 15:57

1 Answers1

0

I tried the below query with 2dsphere indexing.I used the aggregation pipeline
for the below query.

db.items.createIndex({location:"2dsphere"})

While using aggregation pipeline it gives you more flexibility on the result set. Also aggregation pipeline will improve the performance on running geo related searches.

db.items.aggregate([
{
 $geoNear: {
    near: { type: "Point", coordinates: [ -110.8571443 , 35.4586858 ] },
    key: "location",
    distanceField: "dist.calculated",
    minDistance: 2, 
    query: { "itemDescription": "A description" }
 }])

On your question on $skip below question will give you more insight on the $skip oepration $skip and $limit in aggregation framework

You can use $project accordingly to your need. In our case we didnt had much of performance issue using $project over 10 million of data

Sheshan Gamage
  • 574
  • 11
  • 19
  • Thank you. What is `key` for, and, according to the answer you linked, is it better to use `sort` as the stage prior to `limit`? –  May 09 '19 at 06:17
  • key is the field name of you "location":{ "type":"Point", "coordinates":[ -110.8571443, 35.4586858 ] } yes it is always good to use limit at the end of the pipeline – Sheshan Gamage May 09 '19 at 09:10