Why, or why not, to use a schema free database?

Question

I've tried using only mongodb in a web application for some time. But I'm wondering why some people say schema-free or dynamic schema is powerful. Now I don't think it so fantastic or wonderful. Would anybody like to talk about the proper case to use schema free databases? First I'd like tell some of my stories.

What is schema free, the database or the codes?

Most of the NoSQL databases would like to say they are schema-free, but I think down to earth the important part is the codes running in the application.

For example, the storage of user information could be schema free, but it doesn't mean that you could store username as an object or store password as an timestamp. The code for user login assumes that username is a string and password is a hash. And eventually that turns the database storage constrained in schema.

Embedded documents are hard to maintain or to query

I created a CMS as the example to start my NoSQL database life. At the beginning the posts and comments data were stored like this

[
{
    title: 'Mongo is Good',
    content: 'Mongo is a NoSQL database.',
    tags: ['Database', 'MongoDB', 'NoSQL'],
    comments: [ COMMENT_0, COMMENT_1, ... ]
},
{
    title: 'Design CMS',
    content: 'Design a blog or something else.',
    tags: ['Web', 'CMS'],
    comments: [ COMMENT_2, COMMENT_3, ... ]
},
...
]

As you see I embedded comments into a list in each post. It was quite convenient as I could easily append new comment to any post or retrieve comments along with the post. But soon I encountered the first problem: it wasis quite messy to delete a certain comment (usually a spam) from the list. To my surprise mongo haven't still implemented it.

Aside that API level problem, it also hard to query embedded document across the collection. If I insisted on that design, the following queries could only implements in brute force ways

recent comments
comments by one certain user

Eventually I had to place comments into another collection, with a post_id field storing the id of a post the comment belongs to, just like an FK we did in a relational database.

Despite the comments design, the post tags are pretty helpful.

I found an opinion in this post

In NoSQL, you don't design your database based on the relationships between data entities. You design your database based on the queries you will run against it.

But how about changes of the requirements? Is it too crasy to restructure a database only because a new query should be supported?

The cases are worth schema free

In some other cases that need schema free storage. For example, a twitter-like timeline, with data in the following format

[
{
    _id: ObjectId('aaa'),
    type: 'tweet',
    user: ObjectId('xxx'),
    content: '0000',
},
{
    _id: ObjectId('bbb'),
    type: 'retweet',
    user: ObjectId('yyy'),
    ref: ObjectId('aaa'),
},
...
]

The problem is it won't be an easy job to render the documents into HTML. I render them in this way (Python)

renderMethods = {
    'tweet': render_tweet,
    'retweet': render_retweet,
}
result = [ render_methods[u['type']](u) for u in updates ]

Because only the JSON data is stored, not with member functions. As the result I have to manually map a render function to each update according to its type. (Similar things would happen when server send the JSON to browser intactly via AJAX)

The above problems confuse me a lot. Would anyone like to tell about the good practice in schema free database, and whether it'swould a good decision to mix one relational database along with a schema free database in a single application?

Your post reads more like a rant against mongodb's schemaless concept and less like a question. I am just not downvoting it because it's such a well-formated and grammatically correct rant :) — Philipp, May 07 '13 at 13:55

Philipp · Answer 1 · 2013-05-08T07:35:46.787

The main strength of schemaless databases comes to light when using them in an object-oriented context with inheritance.

Inheritance means that you have objects which have some attributes in common, but also some attributes which are specific to the sub-type of object.

Imagine, for example, a product catalog for a computer hardware store.

Every product will have the attributes name, vendor and price. But CPUs will have a clock_rate, hard drives will have a capacity, RAM both capacity and clock_rate and network cards a bandwidth. Doing this in a relational database leaves you two options which are equally cumbersome:

create a table with fields for all possible attributes, but leave most of them NULL for products where they don't apply.
create a secondary table "product_attributes" with productId, attribute_name and attribute_value.

A schemaless database, on the other hand, easily allows to store items in the same collection which have different sets of optional properties. The code to render the product attributes to HTML would then check for the existence of each known optional property and then call an appropriate function which outputs its value as a table row.

Another advantage of schemaless databases is that it gives additional agility during development. It easily allows you to try new features without having to restructure your database. This makes it very easy to maintain backward compatibility to data created by a previous version of the application without having to run complicated database conversion routines. I am currently developing an MMORPG using MongoDB. During the development I added lots of new features which required new data about each character to be persisted on the database. I never had to run a single command equivalent to CREATE TABLE or ALTER TABLE on my database. MongoDB just ate and spewed out whatever data I threw at it. My first test character is still playable, although I never did any intentional upgrading to its database document. It has some obsolete fields which are remnants from features I discarded or refactored, but these don't hurt at all - these obsolete fields would be useful though when my players would scream for bringing back a feature I removed.

Interesting. I think game dev is quite different from web dev. It needs various queries and statistics across the whole database in a web application. And maybe because of that the relational databases seem to be more suitable. — neuront, May 08 '13 at 01:04
@neutront Queries and statistics are also important for game development - they can be used to get valuable information about player behavior which is important for balancing. When you have 200 different weapons and a query shows that 99% of all players with over 50 total play-hours use the same, something is wrong. — Philipp, May 08 '13 at 07:38

score 0 · Answer 2 · answered May 18 '13 at 12:28

0

Doing this in a relational database leaves you two options which are equally cumbersome:

There is one more option i guess here. Adding the data in xml format and let the application deserialize/serialize it the way it wants.

answered May 18 '13 at 12:28

Shailesh Pratapwar

4,054
3
35
46

Why, or why not, to use a schema free database?

What is schema free, the database or the codes?

Embedded documents are hard to maintain or to query

The cases are worth schema free

2 Answers2