9

I'm thinking about system that will notify multiple consumers about events happening to a population of objects. Every subscriber should be able to subscribe to events happening to zero or more of the objects, multiple subscribers should be able to receive information about events happening to a single object.

I think that some message queuing system will be appropriate in this case but I'm not sure how to handle the fact that I'll have millions of the objects - using separate topic for every of the objects does not sound good [or is it just fine?].

Can you please suggest approach I should should take and maybe even some open source message queuing system that would be reasonable?

Few more details:

  • there will be thousands of subscribers [meaning not plenty of them],
  • subscribers will subscribe to tens or hundreds of objects each,
  • there will be ~5-20 million of the objects,
  • events themselves dont have to carry any message. just information that that object was changed is enough,
  • vast majority of objects will never be subscribed to,
  • events occur at the maximum rate of few hundreds per second,
  • ideally the server should run under linux, be able to integrate with the rest of the ecosystem via http long-poll [using node js? continuations under jetty?].

Thanks in advance for your feedback and sorry for somewhat vague question!

ag112
  • 5,537
  • 2
  • 23
  • 42
pQd
  • 116
  • 4
  • 20
  • 1
    This is a fundamentally difficult problem to solve in a scalable way, as evidenced - for example - by the problems Twitter has been having. You could use a standard topic-subscriber model, and use a trick to limit the number of topics: For example, a topic-id could be message-id modulo 1000. Then the listeners of the topics would filter only the messages they are interested about. (Just an idea) – Aapo Kyrola Aug 27 '12 at 19:51
  • @Aapo Kyrola - thanks for the hint. can you please send your comment as answer? also maybe you can suggest particular message queuing server? – pQd Aug 27 '12 at 19:56
  • have you looked at http://aws.amazon.com/sqs/? And at all the tools that they could provide (notifications, etc) – Resh32 Sep 04 '12 at 14:25
  • @Resh32 - thanks for the hint, but i'm looking for a solution that can be used in-house. – pQd Sep 04 '12 at 15:24
  • Take a look at Actors idiom (like in Erlang or Scala) and use immutable data structures, this may safe you a lot of programming effort -) – tuxSlayer Sep 04 '12 at 22:14
  • 1
    I recently read an interesting article about how the folks at twitter are using Scala: http://www.artima.com/scalazine/articles/twitter_on_scala.html – cyber-monk Sep 05 '12 at 15:43
  • I want to ask some questions for clarification. Will all objects will be living in memory of a single machine ? Is 10000 subscribers and 1000 subscriptions per subscriber is a realistic upper bound ? – MichaelT Sep 05 '12 at 21:07
  • the 'objects' are actually something completely else. they are not objects in the OO terminology. sorry for not being clear. you can assume those are people or clients about which i store information somewhere else. i just need a mechanism with which i will be able quickly to notify subscribed consumers about ongoing changes. it's enough to let them know 'something changed' - the message itself does not have to carry any additional payload. – pQd Sep 05 '12 at 22:33

5 Answers5

7

I can highly recommend RabbitMQ. I have used it in a couple of projects before and from my experience, I think it is very reliable and offers a wide range of configuraions. Basically, RabbitMQ is an open-source ( Mozilla Public License (MPL) ) message broker that implements the Advanced Message Queuing Protocol (AMQP) standard.

As documented on the RabbitMQ web-site:

RabbitMQ can potentially run on any platform that Erlang supports, from embedded systems to multi-core clusters and cloud-based servers.

... meaning that an operating system like Linux is supported.

There is a library for node.js here: https://github.com/squaremo/rabbit.js

It comes with an HTTP based API for management and monitoring of the RabbitMQ server - including a command-line tool and a browser-based user-interface as well - see: http://www.rabbitmq.com/management.html.

In the projects I have been working with, I have communicated with RabbitMQ using C# and two different wrappers, EasyNetQ and Burrow.NET. Both are excellent wrappers for RabbitMQ but I ended up being most fan of Burrow.NET as it is easier and more obvious to work with ( doesn't do a lot of magic under the hood ) and provides good flexibility to inject loggers, serializers, etc.

I have never worked with the amount of amount of objects that you are going to work with - I have worked with thousands ( not millions ). However, no matter how many objects I have been playing around with, RabbitMQ has always worked really stable and has never been the source to errors in the system.

So to sum up - RabbitMQ is simple to use and setup, supports AMQP, can be managed via HTTP and what I like the most - it's rock solid.

Lasse Christiansen
  • 10,205
  • 7
  • 50
  • 79
4

Break up the topics to carry specific events for e.g. "Object updated topic" "Object deleted"...So clients need to only have to subscribe to the "finite no:" of event based topics they are interested in.

Inject headers into your messages when you publish them and put intelligence into the clients to use these headers as message selectors. For eg, client knows the list of objects he is interested in - and say you identify the object by an "id" - the id can be the header, and the client will use the "id header" to determine if he is interested in the message.

Depending on whether you want, you may also want to consider ensuring guaranteed delivery to make sure that the client will receive the message even if it goes off-line and comes back later.

The options that I would recommend top of the head are ActiveMQ, RabbitMQ and Redis PUB SUB ( Havent really worked on redis pub-sub, please use your due diligance)

Finally here are some performance benchmarks for RabbitMQ and Redis

Just saw that you only have few 100 messages getting pushed out / sec, this is not a big deal for activemq, I have been using Amq on a system that processes 240 messages per second , and it just works fine. I use a thread pool of workers to asynchronously process the messages though . Look at a framework like akka if you are in the java land, if not stick with nodejs and the cool Eco system around it.

JVXR
  • 1,294
  • 1
  • 11
  • 20
2

If it has to be open source i'd go for ActiveMQ, and an application server to provide the JMS functionality for topics and it has Ajax Support so you can access them from your client

So, you would use the JMS infrastructure to publish the topics for the objects, and you can create topis as you need them

Besides, by using an java application server you may be able to take advantages from clustering, load balancing and other high availability features (obviously based on the selected product)

Hope that helps!!!

Community
  • 1
  • 1
Carlos Grappa
  • 2,351
  • 15
  • 18
  • can ActiveMQ handle millions of topics? – pQd Sep 04 '12 at 17:46
  • I would guess so, however, its not about the topics so much as its about hardware (some CPU & surely a lot of RAM), and the underliying operating systemn (amount of connections at any given time, sockets/thread, and the limitations of the TCP stack) – Carlos Grappa Sep 04 '12 at 17:53
  • to be honest i was looking for answers from people having experience with similar size rather than assumptions that 'it should work'. but thanks anyway. – pQd Sep 05 '12 at 09:55
  • ActiveMQ is a pretty solid messaging implementation. It may not be able to handle millions of messages per second. But there are ways to tune it to spit fire. http://activemq.apache.org/performance.html – JVXR Sep 05 '12 at 23:37
2

Since your messages are very small might want to consider MQTT, which is designed for small devices, although it works fine on powerful devices as well. Key consideration is the low overhead - basically a 2 byte header for a small message. You probably can't use any simple or open source MQTT server, due to your volume. You probably need a heavy duty dedicated appliance like a MessageSight to handle your volume.

Some more details on your application would certainly help. Also you don't mention security at all. I assume you must have some needs in this area.

  • thanks for your answer. actually it'll be all internal traffic between 'trusted' processes / machines - so there's no need for security features. – pQd May 18 '14 at 18:14
1

Though not sure about your work environment but here are my bits. Can you identify each object with unique ID in your system. If so, you can have a topic per each event type. for e.g. you want to track object deletion event, object updation event and so on. So you can have topic for each event type. These topics would be published with Ids of object whenever corresponding event happened to the object. This will limit the no of topics you needed. Second part of your problem is different subscribers want to subscribe to different objects. So not all subscribers are interested in knowing events of all objects. This problem statement scoped to message selector(filtering) mechanism provided by messaging framework. So basically you need to seek on what basis a subscriber interested in particular object. Have that basis as a message filtering mechanism. It could be anything: object type, object state etc. So ultimately your system would consists of one topic for each event type with someone publishing event messages : {object-type:object-id} information. Subscribers could subscribe to any topic and with an filtering criteria.

If above solution satisfy, you can use any messaging solution: activeMQ, WMQ, RabbitMQ.

ag112
  • 5,537
  • 2
  • 23
  • 42
  • i can identify objects thorough id. i dont need to track what happened; information that something happened is enough, it'll tell the client to retrieve object details. subscribers will subscribe to [relatively] very little objects. – pQd Sep 06 '12 at 07:37