19

I’m starting to learn MongoDB and I at one moment I was asking myself how to solve the “one to many” relationship design in MongoDB. While searching, I found many comments in other posts/articles like ” you are thinking relational “. Ok, I agree. There will be some cases like duplication of information won’t be a problem, like in for example, CLIENTS-ORDERS example.

But, suppose you have the tables: ORDERS, that has an embedded DETAIL structure with the PRODUCTS that a client bought. So for one thing or another, you need to change a product name (or another kind of information) that is already embedded in several orders.

At the end, you are force to do a one-to-many relashionship in MongoDB (that means, putting the ObjectID field as link to another collection) so you can solve this simple problem, don’t you ? But every time I found some article/comment about this, it says that will be a performance fault in Mongo. It’s kind of disappointing

Is there another way to solve/design this without performance fault in MongoDB ?

tapatio
  • 387
  • 1
  • 4
  • 9
  • 3
    I am inviting you to look at these blog posts: http://blog.mongodb.org/post/88473035333/6-rules-of-thumb-for-mongodb-schema-design-part-3 (and part 1 & 2) – Tug Grall Aug 25 '14 at 12:28
  • don't forget that mongodb is a NoSQL database, so if you have a lot of relation like that in your db you should probably go for an SQL db, because mongodb will always be less preforming on that side. This blog post explain it well : http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/comment-page-1/ – BaptisteL Aug 25 '14 at 12:58
  • @Baptistel Wuaw ! Thanks for the amazing article you linked. That open my eyes and confirmed that my assumptions about MongoDB were right. And as the author says: " The MongoDB docs tell you what it’s good at, without emphasizing what it’s not good at ". So I think better so put my efforts in another place. – tapatio Aug 26 '14 at 00:17
  • 1
    @BaptisteL: [Sarah Mei](http://stackoverflow.com/users/66801/sarah-mei) might be a good programmer, but she isn't a MongoDB expert. Reads like she worked only at 2 very simple MongoDB projects and made several [beginner's mistakes](http://blog.mongodb.org/post/88473035333/6-rules-of-thumb-for-mongodb-schema-design-part-3). She nested documents although it's a many-to-many relationship (episodes/actors) and nested arrays with unlimited elements (social feed). To her defense the manual probably didn't explain [data modeling](http://docs.mongodb.org/manual/data-modeling/) as good as today. – Christian Strempfer Nov 10 '14 at 19:56
  • This question is more complex. I found article with a good explanation http://blog.mongodb.org/post/87200945828/6-rules-of-thumb-for-mongodb-schema-design-part-1 – rnofenko Apr 03 '15 at 14:59

2 Answers2

17

One to Many Relations

In this relationship, there is many, many entities or many entities that map to the one entity. e.g.: - a city have many persons who live in that city. Say NYC have 8 million people.

Let's assume the below data model:

 
  //city
  {
  _id: 1,
  name: 'NYC',
  area: 30,
  people: [{
      _id: 1,
      name: 'name',
      gender: 'gender'
        .....
    },
    ....
    8 million people data inside this array
    ....
  ]
}

This won't work because that's going to be REALLY HUGE. Let's try to flip the head.


 //people
 {
 _id: 1,
 name: 'John Doe',
 gender: gender,
 city: {
     _id: 1,
     name: 'NYC',
     area: '30'
       .....
   }
}

Now the problem with this design is that if there are obviously multiple people living in NYC, so we've done a lot of duplication for city data.

Probably, the best way to model this data is to use true linking.


 //people
 {
 _id: 1,
 name: 'John Doe',
 gender: gender,
 city: 'NYC'
}

//city
{
_id: 'NYC',
...
}

In this case, people collection can be linked to the city collection. Knowing we don't have foreign key constraints, we've to be consistent about it. So, this is a one to many relation. It requires 2 collections. For small one to few (which is also one to many), relations like blog post to comments. Comments can be embedded inside post documents as an array.

So, if it's truly one to many, 2 collections works best with linking. But for one to few, one single collection is generally enough.

Zameer Ansari
  • 28,977
  • 24
  • 140
  • 219
15

The problem is that you over normalize your data. An order is defined by a customer, who lives at a certain place at the given point in time, pays a certain price valid at the time of the order (which might heavily change over the application lifetime and which you have to document anyway and several other parameters which are all valid only in a certain point of time. So to document an order (pun intended), you need to persist all data for that certain point in time. Let me give you an example:

{ _id: "order123456789",
  date: ISODate("2014-08-01T16:25:00.141Z"),
  customer: ObjectId("53fb38f0040980c9960ee270"),
  items:[ ObjectId("53fb3940040980c9960ee271"),
          ObjectId("53fb3940040980c9960ee272"),
          ObjectId("53fb3940040980c9960ee273")
         ],
 Total:400
 }

Now, as long as neither the customer nor the details of the items change, you are able to reproduce where this order was sent to, what the prices on the order were and alike. But now what happens if the customer changes it's address? Or if the price of an item changes? You would need to keep track of those changes in their respective documents. It would be much easier and sufficiently efficient to store the order like:

{
  _id: "order987654321",
  date: ISODate("2014-08-01T16:25:00.141Z"),
  customer: {
               userID: ObjectId("53fb3940040980c9960ee283"),
               recipientName: "Foo Bar"
               address: {
                          street: "742 Evergreen Terrace",
                          city: "Springfield",
                          state: null
                         }
            },
  items: [
    {count:1, productId:ObjectId("53fb3940040980c9960ee300"), price: 42.00 },
    {count:3, productId:ObjectId("53fb3940040980c9960ee301"), price: 0.99},
    {count:5, productId:ObjectId("53fb3940040980c9960ee302"), price: 199.00}
  ]
}

With this data model and the usage of aggregation pipelines, you have several advantages:

  1. You don't need to independently keep track of prices and addresses or name changes or gift buys of a customer - it is already documented.
  2. Using aggregation pipelines, you can create a price trends without the need of storing pricing data independently. You simply store the current price of an item in an order document.
  3. Even complex aggregations such as price elasticity, turnover by state / city and alike can be done using pretty simple aggregations.

In general, it is safe to say that in a document oriented database, every property or field which is subject to change in the future and this change would create a different semantic meaning should be stored inside the document. Everything which is subject to change in the future but doesn't touch the semantic meaning (the users password in the example) may be linked via a GUID.

Markus W Mahlberg
  • 19,711
  • 6
  • 65
  • 89
  • Hi Markus, I thank you for take the time to respond, but I think I didn't explain me well. I understand your point: the order and details where sold at certain price and sent to one address at a give moment in time. If fact your solution is true, but I keep seeing a relational pattern: we must have to use the Id yet. But what I mean is this: suppose your product's name has a typo and its already embeded in many docs. So at the end, the solution is having an Id link ( like in relational ) so you avoid to repeat data. What's the point then about having a document paradigm ? – tapatio Aug 25 '14 at 23:28
  • 1
    Since you do not add the product name to other documents, but only the attributes of a product which are valid for a certain point in time, this shouldn't be a problem. Linking objects isn't a problem, but is has to be done very carefully in order to maintain data validity. So in my example, there wouldn't be the necessity. In case you want to absolutely put the product names into the order document, you can update them quite easily `db.orders.update( {"items.name": withTypo }, { $set: { "items.$.name" : withoutTypo} } )`. – Markus W Mahlberg Aug 26 '14 at 07:22