I'm just learning NoSQL, specifically MongoDB, and more specifically mongoose under Node; but this a somewhat agnostic design question.
What I'm seeing in various tutorials is a data design that has a two-way linkage between the child and parent, and the parent stores a collection of the children as an ObjectId array. Mongoose can then pull in the actual child objects with populate(). For example:
var PostSchema = new mongoose.Schema({
title: String,
comments: [{type: mongoose.Schema.Types.ObjectId, ref: 'Comment'}]
});
var CommentSchema = new mongoose.Schema({
comment: String,
post: {type: mongoose.Schema.Types.ObjectId, ref: 'Post'}
});
To me this seems to create the following problems:
1) Inserting a new comment now also requires an additional update to the Post record to add the comment id to the comments collection. Same is true for deleting a comment.
2) There is no referential integrity, the burden is on the application itself to ensure that no comments get orphaned and no posts contain invalid comment ids.
3) The populate() method is part of mongoose, not MongoDB, so if I need to access this data with something else, how do I get the child objects out?
I always (perhaps mis-)understood the benefit of NoSQL was that you could just store a whole object graph as one entity. So without looking at these tutorials, I would have naively just stored the "comments" as the full objects along with the post, and used a projection to avoid loading them when I didn't need them. Now having played with it, I don't understand why you wouldn't want to do it that way. I ask my fellow StackOverflowians for edification.