Best Model for Representing Many to Many relationships with attributes in MongoDB

Question

What's the most 'mongo' way of representing many-to-many relationships that have attributes?

So for example:

Intro

MYSQL tables

people => firstName, lastName, ...

Movies => name, length ..

peopleMovies => movieId, personId, language, role

Solution 1

Embed people into movies...?

In MongoDB I understand it's good to denormalize and embed but I don't want to embed people into movies, it just doesn't logically make any sense. Because people don't necessarily only have to belongs to movies.

Solution 2

People and Movies will be two separate collections. People => embed [{movieId: 12, personId: 1, language: "English", role: "Main"} ...]

Movies => embed [{movieId: 12, personId: 1, language: "English", role: "Main"} ...]

The issue with this solution is that when we want to update a person's role for a specific movie we'll need to run two update queries to ensure data is in sync in both collections.

Solution 3

We can also do something much more relational like and end up with three collections

People => firstName, lastName, ... Movies => name, length .. Castings => movieId, personId, language, role

The issue with this is that because of the lack of a join statement in MongoDB, it would take 3 queries to go from people -> movies and vice versa.

Here is my question, what are some other ways to model something like this in MongoDB and in a more NoSQL way. In terms of the solutions provided, which one would be the best in terms of performance and convention in mongo.

You are [trying to use Mongo as a relational database](http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/). That way lies madness. — Jim Garrison, Aug 16 '14 at 22:10
Although that article has some nice points, I feel like it completely misses a lot of things. Like embedding cast_members into tv_shows. What if you want to find the tv_shows with a specific cast member? — Maaz, Aug 16 '14 at 22:49
MongoDB was not designed to store relational data. Much of Meteor's API (template iteration, DDP, allow/deny, publications) discourages the use of embedded documents. In my experience, fighting Meteor is a lot harder than fighting Mongo. My general advice is to grit your teeth and store your documents relationally (solution 3). This topic is also covered in the "denormalization" chapter of [Discover Meteor](http://book.discovermeteor.com/). — David Weldon, Aug 16 '14 at 23:40
@DavidWeldon can't check the resource since I'd need to buy it. I'd like to ask you if you've got some resources on how to setup the queries with the relational model in meteor (with pub/sub methods) — Maaz, Aug 17 '14 at 17:55

score 14 · Accepted Answer · edited Jun 20 '20 at 09:12

In many ways meteor's API encourages flat relational documents, however MongoDB is a non-relational data store. This conflict is, unfortunately, left as an exercise for the developer to solve.

The notion of schema structure and joins is an enormous topic to cover within a single answer, so I will attempt to be as succinct as possible.

Reasons why you should choose a relational model

Assume you have comment and post data. Consider what would happen if you embedded comments within your posts.

DDP operates on documents. All of the comments will be sent every time a new comment in the same post is added.
allow and deny rules operate on documents. It may be unreasonable to expect that the same rules apply simultaneously to both posts and comments.
Publications tend to make more sense in terms of collections. In the above scenario, we could not easily publish a list of comments independent of their posts.
Relational databases exist for good reasons. One of them is to avoid the multiple modification problem inherent in your second solution.

Reasons why you should choose an embedded model

Joins are not supported natively by MongoDB, and there isn't a core package to produce a reactive join.

Recommendations

Use your third solution. In my experience, the reasons for choosing a relational model far outweigh the restrictions imposed by the data store. Of course overcoming the lack of joins isn't easy, but the pain is likely to be isolated to only a handful of publish functions. Here are some resources I'd highly recommend:

How to publish a many-to-many relationship on EventedMind. Chris covers your exact use case in detail, however he manually does the reactive join with observe callbacks, which I don't recommend.
Reactive joins in meteor from the Discover Meteor Encyclopedia. This covers the basics of how and why one should do a reactive join.
The denormalization chapter from Discover Meteor. This covers many of the points I made above and also talks about when and how to denormalize some of your data.
You can use Publish with relations to join your data. Alternative packages include: smart publish, publish composite, and simple publish.

If you need more information beyond this, please comment below and I will update my answer.

Though 'publish composite' does a pretty decent job of publishing related data, of course the client just gets access to that data in their individual collections - the relationships are lost. So the client logic has to have knowledge of the data structures to do this again, which is a shame. — Andy Lorenz, Sep 21 '15 at 22:02
Agreed. Adding a [model layer](http://dweldon.silvrback.com/models) to your app can help. — David Weldon, Sep 21 '15 at 22:05
David that's awesome, thank you for that pointer! It looks a great way to consolidate related data into a single collection, AND encapsulate that into a single code block. I'm so pleased I dropped my adhoc comment onto your answer to discover this ! — Andy Lorenz, Sep 21 '15 at 22:23

score 1 · Answer 2 · answered Oct 03 '15 at 10:09

I think you should denormalize your collections. The important point when designing MongoDB collections and documents is to think about your views. What data do you need to display your view? The idea is that you should try to have that data be part of your document.

For example, in your case, you probably have a view for Movies where you want to display information about a movie. But that page about a movie probably needs just basic information about each person (first name, last name, photo URL). Not all other things. And vice-versa, the page about a person will probably list all movies, but also again only a subset of information about each movie is needed, like title, year, and poster photo URL.

So one option would be to have two collections, but then embed (denormalize) just those few fields you need between collections. So for example, Movies collection would have a field people which would be an array of subdocuments. And the People collection would have movies field which would be an array of subdocuments, with those extra fields you want to specify role and so on.

So documents might be something like the following. For movies:

{
  _id: "AAA",
  title: "...",
  year: 2015,
  length: 120,
  posterURL: "...",
  people: [
    {
      person: {
        _id: "BBB",
        firstName: "...",
        lastName: "...",
        photoURL: "..."
      },
      role: "..."
    }
  ]
}

For people:

{
  _id: "BBB",
  firstName: "...",
  lastName: "...",
  photoURL: "...",
  movies: [
    {
      _id: "AAA",
      title: "...",
      year: 2015,
      posterURL: "..."
    }
  ]
}

Of course, the question is how to keep those fields in sync. What if you update the poster photo URL of a movie, you want it to be updated in all Person documents as well. To solve this problem, we developed PeerDB, a package to define relations between collections which then makes sure they are kept in sync.

So in your case, I would have such collections defined in PeerDB, in CoffeeScript:

class People extends Document
  @Meta
    name: 'People'

class Movies extends Document
  @Meta
    name: 'Movies'
    fields: =>
      people: [
        person: @ReferenceField People, ['firstName', 'lastName', 'photoURL'], true, 'movies', ['title', 'year', 'posterURL']
      ]

In short, this definition says that people.person field should be a reference to People collection and kept in sync for firstName, lastName, photoURL. Additionally, a reverse reference field should be made in People documents under the field movies with title, year, posterURL.

Pretty simple. But there are some downsides. The arrays could get very big (maybe not in the case of movies and people, but for some other data) which could make documents too big for MongoDB per-document limits (at the moment 16 MB). Additionally, if you observe, you will see that for People documents there is no information about the role in the list of movies. This is because the role is not part of the referenced document, but it is something which is next to the reference. What if you would want to display role for movie a person was in on the person page/view?

So, maybe it would be better to have three collections, one for basic information about movies, another for people, and then a collection for relation between people and movies. So data could be maybe something like, for movies:

{
  _id: "AAA",
  title: "...",
  year: 2015,
  length: 120,
  posterURL: "..."
}

For people:

{
  _id: "BBB",
  firstName: "...",
  lastName: "...",
  photoURL: "..."
}

For casting:

{
  _id: "...",
  movie: {
    _id: "AAA",
    title: "...",
    year: 2015,
    posterURL: "..."
  },
  person: {
    _id: "BBB",
    firstName: "...",
    lastName: "...",
    photoURL: "..."
  },
  role: "..."
}

And PeerDB definitions:

class People extends Document
  @Meta
    name: 'People'

class Movies extends Document
  @Meta
    name: 'Movies'

class Casting extends Document
  @Meta
    name: 'Casting'
    fields: =>
      person: @ReferenceField People, ['firstName', 'lastName', 'photoURL']
      movie: @ReferenceField Movies, ['title', 'year', 'posterURL']

PeerDB would then make sure that things are kept in sync. It would also remove casting document if a movie or a person is deleted from the database.

This then allows you to make a Meteor publish which is efficient and does not require any dynamic building of related queries. You simply publish Casting collection and this is it. You can even query on some condition. For example, you want to display all directors sorted by firstName and lastName and their movies? Possible with only one query.