I've tried using only mongodb in a web application for some time. But I'm wondering why some people say schema-free or dynamic schema is powerful. Now I don't think it so fantastic or wonderful. Would anybody like to talk about the proper case to use schema free databases? First I'd like tell some of my stories.
What is schema free, the database or the codes?
Most of the NoSQL databases would like to say they are schema-free, but I think down to earth the important part is the codes running in the application.
For example, the storage of user information could be schema free, but it doesn't mean that you could store username as an object or store password as an timestamp. The code for user login assumes that username is a string and password is a hash. And eventually that turns the database storage constrained in schema.
Embedded documents are hard to maintain or to query
I created a CMS as the example to start my NoSQL database life. At the beginning the posts and comments data were stored like this
[
{
title: 'Mongo is Good',
content: 'Mongo is a NoSQL database.',
tags: ['Database', 'MongoDB', 'NoSQL'],
comments: [ COMMENT_0, COMMENT_1, ... ]
},
{
title: 'Design CMS',
content: 'Design a blog or something else.',
tags: ['Web', 'CMS'],
comments: [ COMMENT_2, COMMENT_3, ... ]
},
...
]
As you see I embedded comments into a list in each post. It was quite convenient as I could easily append new comment to any post or retrieve comments along with the post. But soon I encountered the first problem: it wasis quite messy to delete a certain comment (usually a spam) from the list. To my surprise mongo haven't still implemented it.
Aside that API level problem, it also hard to query embedded document across the collection. If I insisted on that design, the following queries could only implements in brute force ways
- recent comments
- comments by one certain user
Eventually I had to place comments into another collection, with a post_id
field storing the id of a post the comment belongs to, just like an FK we did in a relational database.
Despite the comments design, the post tags are pretty helpful.
I found an opinion in this post
In NoSQL, you don't design your database based on the relationships between data entities. You design your database based on the queries you will run against it.
But how about changes of the requirements? Is it too crasy to restructure a database only because a new query should be supported?
The cases are worth schema free
In some other cases that need schema free storage. For example, a twitter-like timeline, with data in the following format
[
{
_id: ObjectId('aaa'),
type: 'tweet',
user: ObjectId('xxx'),
content: '0000',
},
{
_id: ObjectId('bbb'),
type: 'retweet',
user: ObjectId('yyy'),
ref: ObjectId('aaa'),
},
...
]
The problem is it won't be an easy job to render the documents into HTML. I render them in this way (Python)
renderMethods = {
'tweet': render_tweet,
'retweet': render_retweet,
}
result = [ render_methods[u['type']](u) for u in updates ]
Because only the JSON data is stored, not with member functions. As the result I have to manually map a render function to each update according to its type. (Similar things would happen when server send the JSON to browser intactly via AJAX)
The above problems confuse me a lot. Would anyone like to tell about the good practice in schema free database, and whether it'swould a good decision to mix one relational database along with a schema free database in a single application?