2

Ok, I understand NoSQL databases are all about not using joints for their querying, but I simply can't wrap my head around some concepts. For example, lets say I want to have blog that will have multiple authors and articles that would be related to the authors, in MySQL I'd create table of users:

Users: id, name, surname, nickname, password...
Articles: id, user_id, title, content, date, tags...

But I'm not sure what would be the best way to set this up properly in MongoDB. Should I just say:

db.users.insert({
    id:1,
    name: "Author name",
    ...
    articles: [{id:1, article:1, title:"Article title", ...}, {...}, ...]
});

Should I maybe do something like this?:

db.articles.insert(
    {
    ...
    article related stuff
    ...
    user related stuff: {...}
);

Or maybe I should have separate database for articles and separate database for users?

If I have homepage that will display 10 most recent article excerpts along with author data, in MySQL, I'd just do a joint query to get author nick name from authors table, and title and excerpt from article table.

I'm really unsure how to represent my data in an document oriented database. Maybe I should store author data in each of his articles, but than If author changes his info all articles from that author needs to be updated.

It seems logical to me to create separate documents in MongoDB. One that will hold all author documents, and one that will hold all article documents, but that again would require some kind of joint operation that will get first 10 articles and get the author data from authors document.

Ok, maybe some map reduce operation, but I'm not sure how it would look.

I'd appreciate your thoughts, and advices on this problem of mine. Thanks!

[EDIT] Also, if I hold all articles in one document there's limit of 16 mb per one document if I'm correct, and that would be a problem in case of large website, so I guess there should be separate database for articles?

Jinx
  • 857
  • 2
  • 14
  • 28
  • 2
    I would assume you have gone through some of these: http://www.mongodb.org/display/DOCS/Schema+Design, http://www.mongodb.org/display/DOCS/MongoDB+Data+Modeling+and+Rails, http://www.10gen.com/presentations/mongosf-2012/mongodb-schema-design-insights-and-tradeoffs, right? – Pavel Veller May 09 '12 at 18:40
  • http://stackoverflow.com/questions/5224811/mongodb-schema-design-for-blogs – Pavel Veller May 09 '12 at 18:41
  • 1
    I highly recommend going through any training/presentation called "Schema design" on http://www.10gen.com/presentations so that hopefully you'll understand that the schema design is closely related to your application design and you can't design a good document schema in isolation. – Asya Kamsky May 09 '12 at 18:43
  • @Asya Kamsky You make a good point, but they are constantly talking about design flexibility which I simply can't find anywhere in their examples. – Jinx May 09 '12 at 18:48
  • 1
    Flexibility refers to _exactly_ that - if you have two different applications using similar data different, each can choose the schema that's appropriate to their use pattern. So, like Derick's answer mentions, considering the types of queries your app is going to be making is key to selecting an appropriate schema - the same schema design is not forced on all apps which need authors and articles (or whatever). That's the flexibility. Plus you can evolve your schema as your application evolves - more flexible in not requiring laborious "schema upgrades" – Asya Kamsky May 09 '12 at 19:45

2 Answers2

3

First, let me correct some of your terminology:

  • db.databaseName.insert({ is incorrect. After you've connected to the database, you insert documents into collections. The line should be written as db.articles.insert({.

  • The maximum document size is 16MB at the moment.

What I probably would do in this case, is to store all the articles in an articles collection, where one of the fields would be author name (or author nick). The reason for this is mostly because you mentioned that this is a query that you will be running a lot on the homepage. You can then store additional author information in documents in the authors collection. The _id field of each author could be just the author name (or author nick)-it doesn't need to be of the type "ObjectId" at all, as long as it's a scalar value (and not an array).

Alternatively, you could just store all the articles by an author as a nested array in the articles collection, something like you show in your first example. A 16MB document limitation might sound like a little, but it's more than you think. For example, the 477 articles on my blog only take up 2.4MB.

Derick
  • 35,169
  • 5
  • 76
  • 99
  • Yes, thanks for correction, I'm a bit sick right now, so my brain doesn't work as good, I've also managed to turn 16 into 60 somehow. – Jinx May 09 '12 at 19:33
  • But what if user wants to change nickname, I'd have to do some kind of bulk update on all articles that contain that nickname. Storing all user articles under one document isn't quite appealing to me, since papers of several of professors on my university exceeds 20 mb in raw html (they are very enthusiastic). If I store nickname in the article, and later I maybe need to get some other info about author to display on post, would that require 2 queries or can it be done in one (I'm thinking of joint queries in SQL here)? – Jinx May 09 '12 at 19:45
  • @Jinx, first of all, don't be afraid to have to use more than query. You might want to just add all the author items that you often show with an article as an embedded object, but other data (such as login password) away from it. Doing two queries is not a problem and (even in SQL) sometimes faster than a join. Don't be afraid to do this. Secondly, you will indeed have to run a bulk update query if an author wants to change his/her nickname, but that's the way how it is. – Derick May 09 '12 at 20:11
  • Ok, thanks. I was puzzled by this approach so I thought I didn't get it at first and that there's a better way. – Jinx May 09 '12 at 20:13
3

As @Pavel already mentioned, we assume that you have been through the http://www.mongodb.org/display/DOCS/Schema+Design .

The schema Design is completely a relative concept in MongoDB, and it defers case by case. How you are going to design the collections, Linking vs Embedding really depends on your data architecture, the size of data and how you wanna query it.

If you Authors' information is not taking too much space, I would say Embeding the Authors information in the Article's document is a good idea. It would be very fast for look-ups, as you can have indexes on Articles and also Authors (Even if they are embedded).

When an author changes his information, updating his/her info collection wide is easy. You just need to do an update on the Articles which have this Author listed in their Authors List. Specially By Using $ (Positional Operator) .http://www.mongodb.org/display/DOCS/Updating#Updating-The%24positionaloperator

But if you are concerned about the size and limit, then it's another story. As @Derick mentioned, 16MB is a lot, I mean A LOT. So if you think you gonna reach the limit, go for the separate collections and do the linking.

As far as I know MongoDB by default doesn't provide the MapReduce functionality across multiple collections, you might end up doing it in several steps, which would be very resource consuming.

MapReduce is not very optimal for Production use. It's the best to be used by a batch process but for real-time aggregation you'd better come up with different solutions (tailored to your needs) and benchmark them. Sometimes it's even faster to find the documents and do the aggregation in the scripting-side (Python, PHP, ...).

As a final note, I just want to say that no matter how beautiful, fast and trendy MongoDB and NoSQL in general are, but they might not be the answer to all the problems. Some problems are best addressed by traditional Relational approaches.

Majid
  • 2,845
  • 3
  • 15
  • 14
  • Yes, thanks. You guys cleared up a lot to me. I'm actually studying NoSQL databases now (and I've read mongodb documentation) but just wasn't sure if I'm getting things right. I don't intend to go with mongodb for something like a blog (I think MySQL is much better for that from what I understand now), but I plan to create a paper versioning system for my university, and document oriented database seemed like a good idea. – Jinx May 09 '12 at 20:43