0

I have 3 collections:

  • vendor (_id, title)
  • product (_id, title, vendorId)
  • vendor.location (_id, coords, vendorId)

Every vendor has multiple locations (up to 10,000) and multiple products (up to 10,000).

I need to find all products near given location and sort them by distance.

What I have came up so far is:

1) Find all vendors near given location:

db.collection('vendor.locations').aggregate([
  {
    $geoNear: {
      near: givenLocation,
      distanceField: 'distance',
      maxDistance: 1000,
      limit: 1000,
      spherical: true
    }
  },
  {
    $group: {
      _id: "$vendor",
      distance: {$min: "$distance"}
    }
  },
  {
    $sort: {distance: -1}
  }

]);

2) Find all products for matched vendors with $in operator, but AFAIK it does not guarantee the order of the matched items.

  db.collection('products').aggregate([
  {
    $match: {
      vendor: {$in: vendorIds}
    }
  },
  {
    $sort: {} //is it possible to sort items by their position in vendorIds array? 
  },
  {
    $limit: limit
  },
  {
    $skip: skip
  }

)

So, basically the problem boils down to finding a way to sort products.

Another potential solution is to denormalize vendor location into product location, but the potential size of collection (and geo index) is worrying me. Collection of products can be up to 1M with up to 10K embedded locations array, so it would result in up 10B items in geo index. How bad is it?

Dziamid
  • 11,225
  • 12
  • 69
  • 104
  • So if it's really so important to you and you already clearly have the `vendorId` stored on the `product`, then why on earth do you not **at the very least** also store the vendor "location" data on the product as well? Please **stop** thinking relationally, if you are using a storage engine that is designed around other concepts. So as you summized in the end, this is actually how you are meant to do it. Will it cost more in storage? Of course it will. But will it be faster than what you are presently doing? Something about a bear in the woods.... – Blakes Seven Mar 19 '16 at 12:07
  • Related: [Does MongoDB's $in clause guarantee order](http://stackoverflow.com/questions/22797768/does-mongodbs-in-clause-guarantee-order). And being silly with `$lookup` when you should just store the location data on the product data instead. Bottom line *"joins cost more in time than index space costs you in dollars of storage"*. And things that take more time, cost you more dollars ultimately. – Blakes Seven Mar 19 '16 at 12:12
  • @BlakesSeven well I have tried to put location data on product and as I anticipated the index size is terribly huge (over 1GB) and it has only one field at the moment, while I will definately need to add more field to it later. So it seem like denormalization does not solve my problem. – Dziamid Mar 20 '16 at 13:39
  • You say this as it's the end of the world! Again, you are using a NoSQL datastore which does ( and not surprisingly ) trade "space" for "speed" and "scalabilty". If you didn't understand those points then you should not be using the product. If you don't actually need the speed and scalability but want all the convenience, then use an RDBMS instead. – Blakes Seven Mar 20 '16 at 13:46

0 Answers0