MongoDB $lookup inside nested array performance & best practice issue

Question

In MongoDB I have:

Collection customers with embedded array orders:

db.customers.insert([
{_id: 1, name: "John", orders: [
  {articleId: 1, quantity: 10},
  {articleId: 2, quantity: 5}
]}]);

Collection articles

db.articles.insert([
{_id: 1, name: "computer"},
{_id: 2, name: "bike"}
]);

This question is not about design. I investigated a lot into the design, so that is fixed for this question. The collections customers, orders, articles is not my real case. I just took it as an example.

I want this result:

{_id: 1, name: "John", orders: [
  {articleId: 1, name: "computer", quantity: 10},
  {articleId: 2, name: "bike", quantity: 5}
]}

I can get this result with the following query:

db.customers.aggregate([
  { $match: { "_id": 1 } },
  { $unwind: "$orders" },
  {
    $lookup: {
      from: "articles",
      localField: "orders.articleId",
      foreignField: "_id",
      as: "article"
    }
  },
  { $unwind: "$article" },
  { $addFields: { "orders.name": "$article.name" } },
  { $project: { article: 0 } },
  {
    $group: {
      _id: "$_id",
      name: { "$first": "$name" },
      orders: { "$push": "$orders" }
    }
  }
]);

Is this the best possible query?

There are much more properties for every collection. And in case of a collection of 20 million customers. customers have an average of 50 orders per customer. And let say there are 80 articles. Also this query is running every minute. So I need to use as less resources as possible.

Or is there a smarter query that has a better performance using less resources?

*"Don't look at the design."* - You might say that but it's hard not to. if you want to ask a real question then ask it. Performance is always relative to what you do "in reality". If you present a contrived case then we can simply say, "embed the data then". Without context you're just asking the generalized question, which only has generalized answers. — Neil Lunn, May 10 '18 at 06:27
@NeilLunn : I just edited my question to make it more clear, it's not about design. The design is fixed. It's about the query getting my result with the circumstances that I describe. The two questions you gave as duplicate doesn't solve my question. — Herman Fransen, May 10 '18 at 06:40
I don't really consider that much of an edit. At best you are still asking "which is best" and opinion based questions really aren't that useful, for either the answers you get or for future reference for others. On the "very small data" you are actually presenting then you probably should "embed" as that is always favorable "best practice" in the face of where you don't actually have the restriction on data size that forces you to go into another collection in the first place. Hence why you're pointed at the existing references. Not a relational database, so not wise to "soley" use as such. — Neil Lunn, May 10 '18 at 06:44
@NeilLunn I ask if somebody has a better query giving this structure, considering 20 million documents `customer` 50 `orders` per `customer` and 80 `articles`. Is that opinion based? I deleted the last part where I am asking if I should do it in a javascript, so I am not asking "which is best" anymore. Please consider re-opening this question, I really would like to get some help. — Herman Fransen, May 10 '18 at 06:59
Is it opinion based? Yes it is. You've been pointed to the long standing debate of embed vs reference. 50 orders and 80 articles ain't big. There is not even a single post on this site where all answers and comments cannot fit into a single document within MongoDB. Opinion based / Too broad / Been asked many times before. Take your pick. Plenty of prior wisdom to learn from already. — Neil Lunn, May 10 '18 at 07:03
@NeilLunn - My situation is more complex than I describe here. I need this structure for my app. I have read a lot about the debate you mentioned. That good, I still have lots to learn. [For example this article was very good to have read](https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1). If the structure is fixed, is this the best possible query? $unwind - $lookup - $unwind - $group looks like a lot. Only the first $unwind make it 50 x 20 million = 1.000 million documents. Sorry for my tenacity, I am here to learn — Herman Fransen, May 10 '18 at 07:15
@NeilLunn : Thank you for your time and discussing with me. It helped me! — Herman Fransen, May 11 '18 at 12:20

MongoDB $lookup inside nested array performance & best practice issue

0 Answers0