34

How would you design the schema for a blog-like site with document-based databases (mongodb). The site has the following objects: User, Article, Comment. User can add Comments to Article. Each User can also vote exactly once per Comment.

I want to be able to do these queries efficiently:
1. get Article A, comments on Article A and # of votes per comments
2. get all comments by User B across all articles
3. get all comments User B voted for

My first attempt is to put articles and comments in separate collections and comment can contain a list of users that voted for it. This makes query 1 and 2 simple. And for 3, I added Vote collection which keep tracks of votes by users.

There's some obvious drawback such as duplicating user vote data and query 1 will take two calls to the database. Is there a better approach?

Article {
  "user_id"
}

Comment {
   "user_id",
   "article_id",
   [user_voted],
}

Vote {
    "user_id",
    "comment_id",
}
kefeizhou
  • 6,234
  • 10
  • 42
  • 55

1 Answers1

41
Article {
  "_id" : "A",
  "title" : "Hello World",
  "user_id" : 12345,
  "text" : 'My test article',

  "comments" : [
    { 'text' : 'blah', 'user_id' : 654321, 'votes' : [987654]},
    { 'text' : 'foo', 'user_id' : 987654, 'votes' : [12345, 654321] },
    ...
  ]
}

The basic premise here is that I've nested the Comments inside of the Article. The Votes only apply to a Comment, so they've been stored as an array with each Comment. In this case, I've just stored the user_id. If you want to store more information (time_created, etc.), then you can votes an array of objects:

... 'votes' : [ { user_id : 987654, ts : 78946513 } ] ...

How to perform your queries efficiently:

  1. get Article A, comments on Article A and # of votes per comments
db.articles.find( { _id : 'A' } )

This gets everything with one query. You may have to do some client-side logic to count votes per comment, but this is pretty trivial.

  1. get all comments by User B across all articles
db.articles.ensureIndex( { "comments.user_id" : 1 } )
db.articles.find( { "comments.user_id" : 987654 } ) // returns all document fields

The index will allow for efficiently searching the comments within a document.

There's currently no way to extract only the matches from a sub-array. This query will in fact return all of the articles with comments by that user. If this is potentially way too much data, you can do some trimming.

db.articles.find( { "comments.user_id" : 987654 }, { "title" : 1, "comments.user_id" : 1 })
  1. get all comments User B voted for
db.articles.ensureIndex( { "comments.votes" : 1 } )
db.articles.find( { "comments.votes" : 987654 } )

Again, this will return all of the Articles, not just the comments.

There's a trade-off to be made here. Returning the article may seem like we're bringing back too much data. But what are you planning to display to the user when you make query #3?

Getting a list of "comments I've voted for" is not terribly useful without the comment itself. Of course the comment is not very useful without the article itself (or at least just the title).

Most of the time, query #3 devolves into a join from Votes to Comments to Articles. If that's the case, then why not just bring back the Articles to start with?

Gates VP
  • 44,957
  • 11
  • 105
  • 108
  • 7
    What about the limitation of 16MB / document? – Van Thoai Nguyen Jan 29 '14 at 00:43
  • If you are worried about have a single blog post exceed the 16MB (_that's a lot of text_), then you'll need to make a different schema design. Many blogs actually source this as separate queries and so they don't actually access both pieces at once. If this are your worries, then you pick a different schema. – Gates VP Jan 29 '14 at 02:49
  • I need a similar design for a blog, I need to fetch comments sorted by date from all the articles for moderation. Now this result should also be paginated. Should I still stick to this architecture or take the comments section out into another collection? – Nikhil Bhandari Jul 14 '14 at 17:11
  • If you are planning to query Comments separately from Articles, then it deserves its own collection. Modeling in MongoDB should be around "top-level" objects. If you only plan to query Comments as part of the Article, then you can use the pattern above. If you plan to query Comments separately, then you will want to split them up. (_note that the nesting of votes may still work_) – Gates VP Jul 14 '14 at 19:04
  • Actually I'm also worried about the limitation of a single document. That's why I dare not to put them just like the schema above instead of separate them into two documents and set up simple relations. – Benyi May 31 '17 at 07:28
  • What if I need to list all of a post's comments and name of their writers? Do I need at least two queries? – Tooraj Jam May 29 '20 at 16:59