3

I've spent some time researching MongoDB alternatives for implementing a many-to-many relationships including several stackoverflow articles (here and here) and these slides.

I am creating an app using the MEAN stack and I'm trying to get confirmation on my schema setup and best practices in dereferencing a collection of objects.

I have a basic many-to-many relationship between users and meetings (think scheduling meetings for users where a user can be in many meetings and a meeting contains several users).

Given my use case I think it's best that I use referencing rather than embedding. I believe (from what I've read) that it would be better to use embedding only if my meetings had users unique to a single meeting. In my case these same users are shared across meetings. Also, although updating users would be infrequent (e.g., change username, password) I still feel that using a reference feels right - although I'm open to opinions.

Assuming I went with references I have the following (simplified) schema:

var MeetingSchema = new Schema({
  description: {
    type: String,
    default: '',
    required: 'Please fill in a description for the meeting',
    trim: true
  },
  location: {
    type: String,
    default: '',
    required: 'Please fill in a location for the meeting',
    trim: true
  },
  users: [ {
    type: Schema.ObjectId,
    ref: 'User'
  } ]
});

var UserSchema = new Schema({
    firstName: {
        type: String,
        trim: true,
        default: '',
        validate: [validateLocalStrategyProperty, 'Please fill in your first name']
    },
    lastName: {
        type: String,
        trim: true,
        default: '',
        validate: [validateLocalStrategyProperty, 'Please fill in your last name']
    },
    email: {
        type: String,
        trim: true,
        default: '',
        validate: [validateLocalStrategyProperty, 'Please fill in your email'],
        match: [/.+\@.+\..+/, 'Please fill a valid email address']
    },
    username: {
        type: String,
        unique: true,
        required: 'Please fill in a username',
        trim: true
    },
    password: {
        type: String,
        default: '',
        validate: [validateLocalStrategyPassword, 'Password should be longer']
    }
});

First, you will notice that I don't have a collection of meetings in users. I decided not to add this collection because I believe I could use the power of a MongoDB find to obtain all meetings associated with a specific user - i.e.,

db.meetings.find({users:ObjectId('x123')});

Of course I would need to add some indexes.

Now if I'm looking to deference my users for a specific meeting, how do I do that? For those who understand rails and know the different between :include and :join I'm looking for a similar concept. I understand we are not dealing with joins in MongoDB, but for me in order to dereference the users collection from the meeting to get a user's first and last name I would need to cycle through the collection of id's and perform some sort of a db.users.find() for each id. I assume there's some easy MongoDB call I can make to get this to occur in a performant way.

Community
  • 1
  • 1
Arthur Frankel
  • 4,695
  • 6
  • 35
  • 56
  • 1
    If you have all the users' ids in a particular meeting you would just do a single find against users collection to get their names: db.users.find({id:{$in:}},{firstName:1,lastName:1}) in essence. Why would you not just store first and last name of the user in the user array in meetings collection? – Asya Kamsky Jul 13 '14 at 05:13
  • Thank you. So it sounds like I have to do a separate (single) query to get the users using the $in operation - which I guess is fine although I wonder if I could combine the two queries into one (getting the meetings and the user's first and last names)? The reason I'm not just storing the first and last names is that I would have to update those in the rare last of a user changing their name. I imagine that's an ok tradeoff meaning I should just store those and deal with the rare case. – Arthur Frankel Jul 13 '14 at 13:18
  • exactly - why optimize for something that *might* happen - plus do you really need to update a name of a person in a meeting that already took place? I would think that if they attended that meeting as Joe Shmoe, it shouldn't be changed *after* the fact because now he's known as Joe Doe... Even if you do want to change it, it's a single "multi" update. – Asya Kamsky Jul 13 '14 at 21:39

1 Answers1

6

For a discussion of schema design in MongoDB, covering exactly this topic, I refer you to these postings on the MongoDB blog:

In particular, look at the sample JavaScript code showing you how to do the application-level joins.

William Z
  • 10,989
  • 4
  • 31
  • 25
  • Thank you this is a great set of articles. So it seems that Mongoose's populate() essentially does the "application-level join"/dereferencing of the data. Is that correct? – Arthur Frankel Jul 13 '14 at 20:28
  • That's exactly what populate() does. I'd suggest re-reading this page in light of what you've learned in the articles – William Z Jul 14 '14 at 04:32
  • What's not shown in the documentation is how the query is actually implemented. Since the exec of the query comes after the populate functions I assume mongoose is optimizing the query perhaps into a single query rather than multiple and certainly not n+1. In a traditional rdbms the "populate" function should just do a single join. Since mongodb doesn't have joins I assume the populate is doing the $in clause in a secondary query (like you examples) rather than somehow combining the retrieval of the main table with the collection into a single query. – Arthur Frankel Jul 14 '14 at 13:05
  • I don't know much about Mongoose. The definition for populate() is here AFAICT: https://github.com/LearnBoost/mongoose/blob/3.8.x/lib/model.js#L2060-L2218 – William Z Jul 14 '14 at 17:00