3

What are the recommended best practices for a MongoDB / Mongoose schema to store large amounts of user data in a secure manner.

Each User model needs the usual fields (name, email, etc) and each user could have a large number of associated Content records. (This of a note taking app like Evernote.) Each Content document needs the usual metadata such as date created and updated as well as text content (subject, body) and perhaps binary attachments (which for the sake of this question we can assume will be stored outside of the database so we only need to store a file locator). The body text could be quite large.

Ideally, each User's Content is stored in a separate location from any other user's data. I want to avoid any possibility that a simple mistake in a database query could expose on user's content to another. Also, when a user wants to have their data deleted we need to make sure all their data is removed and only their data. Furthermore, someday the data will need to be sharded by user. I think this means each user's content needs to be in a separate Document or Collection.

Option 1, by reference doesn't achieve these goals yet it is the most common example I've seen:

var mongoose = require('mongoose');
var UserSchema = new mongoose.Schema({
  email: { type: String, required: true },
  name: { given: String, family: String },
  content: [{ type: Schema.Types.ObjectId, ref: 'Content' }],
  ...
});

var ContentSchema = new mongoose.Schema({
  _id: Number,
  subject: String,
  encryptedBody: String,
  ...
});

module.exports = mongoose.model('User', UserSchema);
module.exports = mongoose.model('Content', ContentSchema);

Option 2, embed the content into the User schema

var mongoose = require('mongoose');
var UserSchema = new mongoose.Schema({
  email: { type: String, required: true },
  name: { given: String, family: String },
  ...
  content : [{
    subject: String,
    encryptedBody: String,
    ...
  }]
});
module.exports = mongoose.model('User', UserSchema);

Is there another way?
Is one better than the other regarding performance? (Assume reads far out weigh the number of writes.) Any thoughts about indexing the Content Subject field? Thanks in advance for your thoughts.

Bryan
  • 1,103
  • 12
  • 16
  • 1
    I would do option 1. Check out more of a description here http://stackoverflow.com/questions/5373198/mongodb-relationships – mrtaz Jan 22 '17 at 01:41
  • 1
    @mrtaz Thank you for your opinion and the ref to SO. In that QA I found a ref to http://openmymind.net/Multiple-Collections-Versus-Embedded-Documents/ which is also informative. I was leaning toward Option 2 to keep all the user's private data together in one document but I just read that it's difficult or not possible to sort on the embedded content. That's a killer. Do you know about this limitation? – Bryan Jan 22 '17 at 02:13
  • Well if you are going to be well invested in making constant queries to your embedded documents and sub docs, than separating with option 2 is fine. It does become difficult grabbing certain info such as, grabbing content from a specific user at a specific time. If this is your premise than option 2 is more viable. Check out http://seanhess.github.io/2012/02/01/mongodb_relational.html – mrtaz Jan 22 '17 at 03:12
  • Another great read. Thanks for the link. "Don’t fight the Mongo MongoDB just lets you get stuff done. Don’t become a NoSQL or Document Store purist, just write code that works. It’s the mongo way. It’s easy to store relationships in a separate collection, and the joins are pretty cheap if you don’t split up your data too much. Don’t be overly tempted to store everything in a nested document, because at least in my experience, you end up needing to query against them sooner rather than later." – Bryan Jan 22 '17 at 03:51

0 Answers0