I think you should denormalize your collections. The important point when designing MongoDB collections and documents is to think about your views. What data do you need to display your view? The idea is that you should try to have that data be part of your document.
For example, in your case, you probably have a view for Movies
where you want to display information about a movie. But that page about a movie probably needs just basic information about each person (first name, last name, photo URL). Not all other things. And vice-versa, the page about a person will probably list all movies, but also again only a subset of information about each movie is needed, like title, year, and poster photo URL.
So one option would be to have two collections, but then embed (denormalize) just those few fields you need between collections. So for example, Movies
collection would have a field people
which would be an array of subdocuments. And the People
collection would have movies
field which would be an array of subdocuments, with those extra fields you want to specify role and so on.
So documents might be something like the following. For movies:
{
_id: "AAA",
title: "...",
year: 2015,
length: 120,
posterURL: "...",
people: [
{
person: {
_id: "BBB",
firstName: "...",
lastName: "...",
photoURL: "..."
},
role: "..."
}
]
}
For people:
{
_id: "BBB",
firstName: "...",
lastName: "...",
photoURL: "...",
movies: [
{
_id: "AAA",
title: "...",
year: 2015,
posterURL: "..."
}
]
}
Of course, the question is how to keep those fields in sync. What if you update the poster photo URL of a movie, you want it to be updated in all Person documents as well. To solve this problem, we developed PeerDB, a package to define relations between collections which then makes sure they are kept in sync.
So in your case, I would have such collections defined in PeerDB, in CoffeeScript:
class People extends Document
@Meta
name: 'People'
class Movies extends Document
@Meta
name: 'Movies'
fields: =>
people: [
person: @ReferenceField People, ['firstName', 'lastName', 'photoURL'], true, 'movies', ['title', 'year', 'posterURL']
]
In short, this definition says that people.person
field should be a reference to People
collection and kept in sync for firstName
, lastName
, photoURL
. Additionally, a reverse reference field should be made in People
documents under the field movies
with title
, year
, posterURL
.
Pretty simple. But there are some downsides. The arrays could get very big (maybe not in the case of movies and people, but for some other data) which could make documents too big for MongoDB per-document limits (at the moment 16 MB). Additionally, if you observe, you will see that for People
documents there is no information about the role in the list of movies. This is because the role is not part of the referenced document, but it is something which is next to the reference. What if you would want to display role for movie a person was in on the person page/view?
So, maybe it would be better to have three collections, one for basic information about movies, another for people, and then a collection for relation between people and movies. So data could be maybe something like, for movies:
{
_id: "AAA",
title: "...",
year: 2015,
length: 120,
posterURL: "..."
}
For people:
{
_id: "BBB",
firstName: "...",
lastName: "...",
photoURL: "..."
}
For casting:
{
_id: "...",
movie: {
_id: "AAA",
title: "...",
year: 2015,
posterURL: "..."
},
person: {
_id: "BBB",
firstName: "...",
lastName: "...",
photoURL: "..."
},
role: "..."
}
And PeerDB definitions:
class People extends Document
@Meta
name: 'People'
class Movies extends Document
@Meta
name: 'Movies'
class Casting extends Document
@Meta
name: 'Casting'
fields: =>
person: @ReferenceField People, ['firstName', 'lastName', 'photoURL']
movie: @ReferenceField Movies, ['title', 'year', 'posterURL']
PeerDB would then make sure that things are kept in sync. It would also remove casting document if a movie or a person is deleted from the database.
This then allows you to make a Meteor publish which is efficient and does not require any dynamic building of related queries. You simply publish Casting
collection and this is it. You can even query on some condition. For example, you want to display all directors sorted by firstName
and lastName
and their movies? Possible with only one query.