It has taken me quite a long (calendar) time to get my head around CouchDB and map/reduce and how I can utilize it for various use cases. One challenge I've put myself to understanding is how to use it for normalized data effectively. Sources all over the internet simply stop with "don't use it for normalized data.". I do not like the lack of analysis on how to use it effectively with normalized data!
Some of the better resources I've found are below:
CouchDB: Single document vs "joining" documents together http://www.cmlenz.net/archives/2007/10/couchdb-joins
In both cases, the authors do a great job at explaining how to do a "join" when it is necessary to join documents when there is denormalized commonality across them. If, however, I need to join more than two normalized "tables" the view collation tricks leveraged to query just one row of data together do not work. That is, it seems you need some sort of data about all elements in the join to exist in all documents that would participate in the join, and thus, your data is not normalized!
Consider the following simple Q&A example (question/answer/answer comment):
{ id: "Q1", type: "question", question: "How do I...?" }
{ id: "A1", type: "answer", answer: "Simple... You just..." }
{ id: "C1", type: "answer-comment", comment: "Great... But what about...?" }
{ id: "C2", type: "answer-comment", comment: "Great... But what about...?" }
{ id: "QA1", type: "question-answer-relationship", q_id:"Q1", a_id:"A1" }
{ id: "AC1", type: "answer-comment-relationship", a_id:"A1", c_id:"C1" }
{ id: "AC2", type: "answer-comment-relationship", a_id:"A1", c_id:"C2" }
{ id: "Q2", type: "question", question: "What is the fastest...?" }
{ id: "A2", type: "answer", answer: "Do it this way..." }
{ id: "C3", type: "answer-comment", comment: "Works great! Thanks!" }
{ id: "QA2", type: "question-answer-relationship", q_id:"Q2", a_id:"A2" }
{ id: "AC3", type: "answer-comment-relationship", a_id:"A2", c_id:"C3" }
I want to get one question, its answer, and all of its answer's comments, and no other records from the databse with only one query.
With the data set above, at a high level, you'd need to have views for each record type, ask for a particular question
with an id
in mind, then in another view, use the question
id
to look up relationships specified by the question-answer-relationship
type
, then in another view look up the answer
by the id
obtained by the question-answer-relationship
type
, and so on and so forth, aggregating the "row" over a series of requests.
Another option might be to create some sort of application that does process above to cache denormalized documents in the desired format that automatically react to the normalized data being updated. This feels awkward and like a reimplementation of something that already exists/should exist.
After all of this background, the ultimate question is: Is there a better way to do this so the database, rather than the application, does the work?
Thanks in advance for anyone sharing their experience!