1

I have a database in MongoDb that contains two collections: 'categories' and 'articles'. I'm using Mongoose on NodeJs to connect to the database and read the categories. I want to calculate the number of articles for a category without making an additional request/query, so it would be perfect if I could solve this at the database level.

An item from the 'categories' collection looks like:

{
'_id' : ObjectId("..."),
'feed_id' : 1,
'name': 'Blog posts'
}

An item from the 'articles' collection looks like:

{
'_id' : ObjectId("..."),
'feed_id' : 1,
'title': 'Article title',
'published' : '12/09/2012',
...
}

so the categories and articles are linked using the 'feed_id' field.

I would like to export all categories together with a corresponding number of articles:

{
'_id' : ObjectId("..."),
'feed_id' : 1,
'name': 'Blog posts',
'no_articles': 4
}

I'm not sure how exactly I should do this:

1) Create a 'no_articles' field in the categories collection? If yes, I would like this to be updated automatically when a document is inserted or deleted from the articles collection.

2) Sum up the articles into 'no_articles' when categories are read?

I read something about MapReduce and group, but didn't quite understand if it's possible to use them for this particular task.

ekad
  • 14,436
  • 26
  • 44
  • 46
alex.ac
  • 1,053
  • 2
  • 9
  • 23

2 Answers2

1

This is one of use cases where traditional relational databases really shine.

It is impossible to do that with one query in mongodb. The "no_articles field" you mentioned is the way to go. Common name (among Rails people, anyway) for this approach is: Counter Cache Column. I am not very familiar with Mongoose, so I don't know whether it will maintain that field for you or not. MongoDB itself certainly won't do it. But maintaining it yourself isn't a lot of work, you just need to be accurate.

I advise against counting articles when you read categories. This is a classic example of N+1 query problem and counter cache column is there to prevent it.

Community
  • 1
  • 1
Sergio Tulentsev
  • 226,338
  • 43
  • 373
  • 367
  • I think Counter Cache Column is specific to Rails, it doesn't seem like it is supported by Mongoose. In any case, I will do like you suggested and modify the 'no_articles' fields from the 'categories' collection whenever I update the 'articles' collection. Thanks a lot for you help. – alex.ac Oct 03 '12 at 11:07
0

Why not just store the category directly in the post document? Since it appears that you're creating new category document for each post that uses the category (as evidenced by a 1-to-many linkage using feed_id) then it might make sense to store an array of categories within the post document.

{
'_id' : ObjectId("..."),
'feed_id' : 1,
'title': 'Article title',
'published' : '12/09/2012',
...
categories : [ 'Blog Posts', 'Category 2' ]
}

Then you can do a

db.articles.find({categories : 'Blog Posts' })

To find all the articles with a certain category and you can add a .count() to get the count

Using those feed_ids to join is anathema to MongoDB. You can't join across collections so you either have to denormalize or put everything in one big collection. Mongo is designed so that you'll denormalize everything.

If this doesn't seem like the right way to solve your problem then you might be better suited to use a RDBMS.

Mason
  • 8,767
  • 10
  • 33
  • 34
  • Hm, how did you infer that there's a 1-to-1 relation? I didn't get the impression. – Sergio Tulentsev Oct 04 '12 at 06:02
  • Looks to me like each category has a unique feed_id that it ties to. So even though there might be 10 "c++" categories I was under the impression that a new one will be created for each post that has that category applied. I don't see a way in the current schema to associate a category with more than one post. – Mason Oct 04 '12 at 13:35
  • what about one category with feed_id=1 and ten articles with feed_id=1. Possible? Absolutely. – Sergio Tulentsev Oct 04 '12 at 15:48
  • Yup, you are right, I misunderstood. Either way it's a poor use of Mongo, trying to do a whole bunch of joins like that. – Mason Oct 04 '12 at 19:15
  • Sorry if I wasn't clear enough. For each category I have many articles, so the relationship is indeed 1 to n. Also, in my case an article belongs to a single category. I stored the categories in a separate collection so I can modify them easily. – alex.ac Oct 05 '12 at 11:49
  • I think you'll find most MongoDB tutorials recommend storing the categories directly in the post documents in their denormalized form – Mason Oct 05 '12 at 15:19