10

I come from a MySQL background and am trying to wrap my head around MongoDB. In particular, I'm struggling to conceptualize how I should model n:n relationships the "Mongo way."

For this example, let's say we have two collections: users and interests. We need to be able to represent or query for several things in our data:

  • User's interests
  • User's rating of interest, e.g. "like" or "dislike"
  • Users with a given interest
  • Counter (which can be incremented/decremented) of each rating of the interest
  • Interest name

In MySQL, I would have created a users_interests table indexed on both user IDs and interest IDs. For the counter, I would have had separate columns for each rating type, and each time a user rated/un-rated an interest, done a transaction to ensure that the counts were never false.

I've tried reading about some schema designs, but to no avail.

Can you help a lost soul find the way?

Josh Smith
  • 14,674
  • 18
  • 72
  • 118

2 Answers2

14

Great question. Let me first outline a bit of how the N:N relationship works then I'll go into detail on each of your bullet points.

N:N in MySQL normally you have your pivot table associating between your user and interests (user_interests table). In mongo you do this a bit differently. You still have a users and interest collection, however instead now, you store a list of keys under interests for a user. SO something like this:

User Collection {
      "name":"Josh",
      "user":"jsmith",
      "interests":[
           {
            "_id":12345,
            "rating":"like"
           },
           {..}..
      ]
}

By storing your interests in a list which is keyed off on your interest table, you can perform each of the actions you require. If you wanted to do a query you'd od it based on the ID which is in the interest table then do a query using the $in modifier.

Now for your interests collection I'd do the following:

User Interest {
      "_id":objectId
      "label":"Swimming",
      "count":intValue
}

When adding an interest to a users document, the count variable would then depend on the definition of your ratings. If you're storing your ratings in a separate area (or in logic), then the value you assigned to them would be what you relate then to the int value in interest. IE: User rates it meh (which has a value of 1) then you would add 1 to the count value.

Hopefully this is helpful and has at the very least brought about some other ideas on how to structure it!

Best of luck, remember MONGO IS AWESOME.

Petrogad
  • 4,405
  • 5
  • 38
  • 77
  • why store the id? It's unlikely that you'll rename an interest, but keep the references, right? Also, you'll have to perform an `$in` to display the names of the likes. – mnemosyn Nov 30 '11 at 18:34
  • @mnemosyn Normally the code path you'd have one of two scenarios: 1: You have a interest you want to see which users have, or 2: you want to see which interests a user has. With the first, you already have the name, and now you're just doing a query on an ID (faster query) rather than a string. With the second, you have a list of id's which again you're querying on the object ID. Also if you do change an interest name you don't have to refactor through every user entry to update it, as all you need is the ref id to that particular interest in the interest collection. Hope this helps! – Petrogad Nov 30 '11 at 18:37
  • @mnemosyn What if I decide to uppercase the interest, from "fishing" to "Fishing" or something? – ceejayoz Nov 30 '11 at 18:40
  • Exactly, this is why it's always good to keep a ref key, then make your modifications in one area with an identifier that is always constant. – Petrogad Nov 30 '11 at 18:41
  • Can you provide a resource that says that Mongo's BTree queries are faster for ids than for strings? I believe the overhead for a list of objects and (presumably very common) `$in` queries will be much greater. – mnemosyn Nov 30 '11 at 18:42
  • @mnemosyn I'll take a look, I'm fairly certain about this, however will need to take a peak and look when at home. I'll post the link in my answer. – Petrogad Nov 30 '11 at 18:47
  • Awesome answer so far. I'm curious about the other points, particularly how you would structure the count. – Josh Smith Nov 30 '11 at 18:52
  • @Frederico Thanks for the answer update. I really should have mentioned, but my UI use case dictates that I have different counts for each rating. In this case the rating types are "like," "dislike", "meh", "love", and "bookmark." This means your -1, 0, 1 idea simply won't work for my needs. Great idea in general, though. – Josh Smith Nov 30 '11 at 19:18
  • @JoshSmith Gotcha, I'll modify that. How are you tracking what each of your rating types does for incrementing / decrementing the rating which an interest has then? Do you have a mapping in your code which performs that logic? – Petrogad Nov 30 '11 at 19:20
  • @Frederico There will actually be five separate counts, each with their own values. – Josh Smith Nov 30 '11 at 21:40
1

To maintain a global count of the ratings of each interest, you will need a separate and independent collection where you update (add or subtract) ratings using atomic update operators as and when like/dislike actions for interests are performed by users.

You can store each User's interest as an array of sub-documents within the User collection itself.

The JSON structure of this data would be something similar to:

db.User
{
    name: 'joe',
    ....,
    interests : [{ name: 'swimming', rating: 10},
              { name: 'cooking', rating: 22 }
              ]
}

Now you can query on the internal keys using:

> db.User.find( { "interests.name" : "cooking" } )

This will return users who have a particular interest.

epicwhale
  • 1,835
  • 18
  • 26
  • This way, the rating is denormalized for every user, i.e. every user has a copy of the global rating (`rating : 101) This can't be maintained. – mnemosyn Nov 30 '11 at 18:32
  • There's a difference here between ratings and counts. The count is a global, the rating is relative to the user. The rating should be a string, though, and not an integer. – Josh Smith Nov 30 '11 at 18:44
  • 1
    I removed my answer. I favor your approach, but I recommend you add a note about the need for `$elemMatch` and make `rating` a string. – mnemosyn Nov 30 '11 at 18:53