1

Schema:

articles: [
  {
    _id: uid,
    owner: userId,
    title: string,
    text: text,
  }
],
comments_1: [
  {
    // single comment
    articleId: uid,
    text: text,
    user: {
      name: string,
      id: uid
    }
  }
],
comments_2: [
  {
    // all comments at once
    articleId: uid,
    comments: [
      {
        _id: commentId,
        text: text,
        user: {
          name: string,
          id: uid
        }
      }
    ],
  }
],

I'm a bit confused with mongodb recommendations: Say, i need to retrieve information for an article page. I'll need to do 2 requests, first to find article by id, and the second to find comments. If i'd include comments (comments_2) as property into each article, i'd need to perform only one query to get all the data i need, and if i'd need to list say, titles of 20 articles, i'd perform a query with specified properties to be retrieved, right?

  1. Should i store comments and articles in different collections?
  2. If comments will be in different store, should i store comments the comments_1 way or comments_2 way?

I'll avoid deep explanations, because the schema explains my point clearly, i guess. Briefly, i don't get if it's better to store everything in one place and then specify properties i want to retrieve while querying, or abstract pieces of data to different collections?

stkvtflw
  • 12,092
  • 26
  • 78
  • 155

3 Answers3

2

In a relational database, this would be achieved by JOIN. Apparently, there is a NoSQL equivalent in MongoDB, starting from version 3.2 called $lookup

This allows you to keep comments and articles in separate schemas, but still retrieve list of comments for an article with a single query.

Stack Overflow Source

Community
  • 1
  • 1
msaw328
  • 1,459
  • 10
  • 18
2

It's a typical trade-off you have to make. Both approaches have their own pros and cons and you have to choose what fits best for your use case. Couple of inputs:

Single table:

  • fast load single article, since you load all data in one query
  • no issues with loading titles of 20 articles (you can query only subset of fields using projection

Multiple table:

  • much easier to do perpendicular queries (e.g comments made by specific user, etc)
  • I would go with version 1, since it's simpler and version 2 won't give you any advantage
Yaroslav Admin
  • 13,880
  • 6
  • 63
  • 83
  • Thank you, @YaroslavAdmin, one more clarification: if i'll store everything in one collection (including comments), is there way to retrieve, say, all data, but specific quantity of comments? – stkvtflw Dec 31 '16 at 12:39
  • @stkvtflw Yes, it's possible. See [this answer](http://stackoverflow.com/a/8452577/1377864). – Yaroslav Admin Dec 31 '16 at 12:44
2

Well, MongoDB models are usually meant to hold data and relationship together since it doesn't provides JOINS ($lookup is the nearest to join and costly, best to avoid).

That's why in DB modeling there is huge emphasis on denormalization, since there are two benefits of storing together

  1. You wouldn't have to join the collections and you can get the data in a single query.
  2. Since mongo provides atomic update, you can update comments and article in one go, not worrying about transaction and rollback.

So almost certainly you would like to put comments inside article collection. So it would be something like

articles: [
  {
    _id: uid,
    owner: userId,
    title: string,
    text: text,
    comments: [
      {
        _id: commentId,
        text: text,
        user: {
          name: string,
          id: uid
        }
      }
    ]
  }
]

Before we agree to it, let us see the drawback of above approach.

  1. There is a limit of 16MB per document which is huge, but think if the text of your article is large and the comments on that article is also in large number, maybe it can cross 16 MB.

  2. All the places where you get article for other purposes you might have to exclude the comments field, otherwise it would be heavy and slow.

  3. If you have to do aggregation again we might get into memory limit issue if we need to aggregate based on comments also one way or other.

These are serious problem, and we cannot ignore that, now we might want to keep it in different collection and see what we are losing.

First of all comment and articles though linked but are different entity, so you might never need to update them together for any field.

Secondly, you would have to load comments separately, which makes sense in normal use-case, in most application that's how we proceed, so that too is not an issue.

So in my opinion clear winner is having two separate collection

articles: [
  {
    _id: uid,
    owner: userId,
    title: string,
    text: text,
  }
],
comments: [
  {
    // single comment
    articleId: uid,
    text: text,
    user: {
      name: string,
      id: uid
    }
  }
]

You wouldn't want to go comment_2 way if you are choosing for two collection approach, again for same reason as what if there are huge comments for a single article.

Rahul Kumar
  • 2,781
  • 1
  • 21
  • 29