MongoDB Schema Design - Real-time Chat

Question

I'm starting a project which I think will be particularly suited to MongoDB due to the speed and scalability it affords.

The module I'm currently interested in is to do with real-time chat. If I was to do this in a traditional RDBMS I'd split it out into:

Channel (A channel has many users)
User (A user has one channel but many messages)
Message (A message has a user)

The the purpose of this use case, I'd like to assume that there will be typically 5 channels active at one time, each handling at most 5 messages per second.

Specific queries that need to be fast:

Fetch new messages (based on an bookmark, time stamp maybe, or an incrementing counter?)
Post a message to a channel
Verify that a user can post in a channel

Bearing in mind that the document limit with MongoDB is 4mb, how would you go about designing the schema? What would yours look like? Are there any gotchas I should watch out for?

score 3 · Answer 1 · edited Sep 17 '13 at 10:05

3

Why use mongo for a messaging system? No matter how fast the static store is (and mongo is very fast), whether mongo or db, to mimic a message queue your going to have to use some kind of polling, which is not very scalable or efficient. Granted you're not doing anything terribly intense, but why not just use the right tool for the right job? Use a messaging system like Rabbit or ActiveMQ.

If you must use mongo (maybe you just want to play around with it and this project is a good chance to do that?) I imagine you'll have a collection for users (where each user object has a list of the queues that user listens to). For messages, you could have a collection for each queue, but then you'd have to poll each queue you're interested in for messages. Better would be to have a single collection as a queue, as it's easy in mongo to do "in" queries on a single collection, so it'd be easy to do things like "get all messages newer than X in any queues where queue.name in list [a,b,c]".

You might also consider setting up your collection as a mongo capped collection, which just means that you tell mongo when you set up the collection that your collection should only hold X number of bytes, or X number of items. Adding additional items has First-In, First-Out behavior which is pretty much ideal for a message queue. But again, it's not really a messaging system.

edited Sep 17 '13 at 10:05

Simone

20,302
14
79
103

answered May 29 '10 at 21:19

Steve B.

55,454
12
93
132

1

I would not suggest that the MQ solutions out there are really that much better than some of the NoSQL solutions out there. A lot of MQ tech seems complicated & over-engineered, plus performance isn't always that great, stability & portability may also be sacrificed. See: http://bhavin.directi.com/rabbitmq-vs-apache-activemq-vs-apache-qpid/ – Klinky May 30 '10 at 00:08
1

There are decent MQ solutions out there, I just find they're the ones without much in the way of features, ZeroMQ and Kestrel are both good for their purposes. ActiveMQ on the other hand is horrific. – Michael May 30 '10 at 03:48
@Klinky I bet almost any specific MQ solution (especially ActiveMQ) would deal with the messaging (EDA) problem times better, than a custom solution based on a NoSQL of an unspecified type (did you mean a document-oriented DB, or key-value store or what?), because MQ solutions are designed for that problem, and, FTN ActiveMQ uses it's own optimized high-performance data storage for queue persistence. – Vasil Remeniuk Jun 03 '10 at 07:09
1

@Steve B. "...,which is not very scalable or efficient" -- don't agree on "scalable" (though agree on efficiency and performance). Why? Opposed to storing queues in memory (which leads to problems, if you have 1+ node in your cluster -- you either need to setup replication or build a network of brokers), making multiple consumers work on a persisted queue seem to be less problematic (especially, considering failure scenarios). – Vasil Remeniuk Jun 03 '10 at 07:15
@Vasil. MQ solutions all seem to have their own thought process and methodology w/ large length specs and stuffy documentation. A lot of them seem angled for enterprise situation which may need complex setups. When I was investigating MQs, I found a blog on the complexities of getting one stable for an enterprise SMS based application, read about Twitter developing their own MQ solution because of failure w/ActiveMQ & RabbitMQ. Also the link I posted is offering ActiveMQ w/ 22K msg/sec, which is not a speed demon. Not many details are given about their setup, but it's at least one data point. – Klinky Jun 04 '10 at 00:27
1

@Klinky Twitter developers have done a lot of weird stuff, you know :) (if you had a chance to read a book about Scala by one of the Twitter's lead architects, you may guess, how "good" is their MQ solution). Regarding ActiveMQ - personally I've had an extremely good experience with it (I was using it to build a merely huge distributed mass mailing system). ~30-60k/sec throughput is a basic setup with one broker - if you build a network of brokers, performance could be times higher. – Vasil Remeniuk Jun 04 '10 at 06:35
@Vasil, to each his own I guess. I just found NoSQL more straightforward to get started with. I understand what a queue is and that I want to put stuff on it and take stuff off. Something like Redis makes this super easy to do. As far as Redis performance, I can push on to a queue about 35K msgs/sec. Potentially retrieve from the queue at up to 400K msgs/ sec. Tested on my Celeron E3200 1MB L2 @ 3.8Ghz overclock, inside Ubuntu Virtualbox w/ IntelVT enabled. Redis is not multi-threaded so this is only using 1 of 2 cores. I guess it depends on what you need your 'MQ' to do. – Klinky Jun 05 '10 at 00:21
>>> I can push on to a queue about 35K msgs/sec. Potentially retrieve from the queue at up to 400K msgs/ sec.<<< Hehe) Sounds interesting and promisng. I didn't have much chance to get the hands dirty with Redis, and would be happy to glance over a good architecture that uses it -- is your MQ solution a part of a proprietary system, or it's open? – Vasil Remeniuk Jun 05 '10 at 05:29

score 3 · Accepted Answer · answered May 30 '10 at 00:07

I used Redis, NGINX & PHP-FPM for my chat project. Not super elegant, but it does the trick. There are a few pieces to the puzzle.

There is a very simple PHP script that receives client commands and puts them in one massive LIST. It also checks all room LISTs and the users private LIST to see if there are messages it must deliver. This is polled by a client written in jQuery & it's done every few seconds.
There is a command line PHP script that operates server side in an infinite loop, 20 times per second, which checks this list and then processes these commands. The script handles who is in what room and permissions in the scripts memory, this info is not stored in Redis.
Redis has a LIST for each room & a LIST for each user which operates as a private queue. It also has multiple counters for each room the user is in. If the users counter is less than the total messages in the room, then it gets the difference and sends it to the user.

I haven't been able to stress test this solution, but at least from my basic benchmarking it could probably handle many thousands of messages per second. There is also the opportunity to port this over to something like Node.js to increase performance. Redis is also maturing and has some interesting features like Pub/Subscribe commands, which might be of interest, that would possibly remove the polling on the server side possibly.

I looked into Comet based solutions, but many of them were complicated, poorly documented or would require me learning an entirely new language(e.g. Jetty->Java, APE->C),etc... Also delivery and going through proxies can sometimes be an issue with Comet. So that is why I've stuck with polling.

I imagine you could do something similar with MongoDB. A collection per room, a collection per user & then a collection which maintains counters. You'll still need to write a back-end daemon or script to handle manging where these messages go. You could also use MongoDB's "limited collections", which keeps the documents sorted & also automatically clears old messages out, but that could be complicated in maintaining proper counters.

score 1 · Answer 3 · answered Jun 14 '10 at 00:42

1

1) ape-project.org

2) http://code.google.com/p/redis/

3) after you're through all this - you can dumb data into mongodb for logging and store consistent data (users, channels) as well

answered Jun 14 '10 at 00:42

Toby

2,720
5
29
46

MongoDB Schema Design - Real-time Chat

3 Answers3

Linked