MongoDB - Denormalization / model opinion

Question

I've been getting in to mongo, but coming from RDBMS background facing the probably obvious questions with regards to denormalisation and general data modelling.

If I have a document type with an array of sub docs, each sub doc has a status code.

In The relational world I would add a foreign key to the record, StatusId, simple. In mongodb, would you denormalise the key pieces of data from the "status" e.g. Code and desc and hold objectid referencing another collection of proper status. I guess the next question is one of design, if the status doc is modified I'd then need to modified the denormalised data?

Another question on the same theme is how would you model a transaction table, say I have events and people, the events could be quite granular, say time sheets which over time may lead to many records. Based on what I've seen, this would seem like a good candidate for a child / sub array of docs, of course that could be indexed for speed.

Therefore is it possible to query / find just the sub array or part of it? And given the 16mb limit for doc size, and I just limited the transaction history of the person? Or should the transaction history be a separate collection with a onjid referencing the person?

Thanks for any input

Sam

score 3 · Accepted Answer · edited May 23 '17 at 09:59

Or should the transaction history be a separate collection with a onjid referencing the person?

Probably, I think this S/O question may help you understand why.

if the status doc is modified I'd then need to modified the denormalised data?

Yes this is standard trade-off in MongoDB. You will encounter this question a lot. You may need to leverage a Queue structure to ensure that data remains consistent across multiple collections.

Therefore is it possible to query / find just the sub array or part of it?

This is a tough one specific to MongoDB. With the basic query syntax, you have only limited support for dealing with arrays of objects. The new "Aggregration Framework" is actually much better here, but it's not available in a stable build.

score 1 · Answer 2 · answered Feb 03 '12 at 22:44

All your "how to model this or that" can't really be answered, because good schema design depends on so many factors (access patters, hardware characteristics, is cluster used, etc).

if the status doc is modified I'd then need to modified the denormalised data?

Usually yes, that's the drawback of denormalisation. But sometimes you don't have to (some social network site stores user name with a photo tag and doesn't update it when user changes his name).

to query / find just the sub array or part of it?

It is not currently possible to fetch only a part of array (unless using map/reduce, of course).

And given the 4mb limit

Where did you get this from? It's 16mb at the moment.

Thanks Sergio, of course I wasn't asking for a definitive manswer - more really of others experience using mongo/ nosql. 4mb from recent web cast I've seen, it obviously a bit old (sry) — sambomartin, Feb 03 '12 at 22:48

score 0 · Answer 3 · answered Jul 30 '12 at 06:09

While it's true that schema design does take into account many factors, the need to denormalize data usually comes up somewhere. I tend to take advantage of denormalization in my apps that use MongoDB because I feel it lends itself well storing denormalized data:

no additional column maintenance
support for hashes and arrays as field types (perfect for storing denormalized fields)
speedy, non-blocking writes make syncing data less expensive
document size growth only marginally affects performance up to limits (for the most part)

There are a few gems that help you manage denormalized data, including setting it up and keeping it in sync. If you're using Mongoid, you try mongoid_alize. DISCLAIMER: I am the author and maintainer of mongoid_alize.

MongoDB - Denormalization / model opinion

3 Answers3