264

I am new to message brokers like RabbitMQ which we can use to create tasks / message queues for a scheduling system like Celery.

Now, here is the question:

  • I can create a table in PostgreSQL which can be appended with new tasks and consumed by the consumer program like Celery.

  • Why on earth would I want to setup a whole new tech for this like RabbitMQ?

Now, I believe scaling cannot be the answer since our database like PostgreSQL can work in a distributed environment.

I googled for what problems does the database poses for the particular problem, and I found:

  • polling keeps the database busy and low performing
  • locking of the table -> again low performing
  • millions of rows of tasks -> again, polling is low performing

Now, how does RabbitMQ or any other message broker like that solves these problems?

Also, I found out that AMQP protocol is what it follows. What's great in that?

Can Redis also be used as a message broker? I find it more analogous to Memcached than RabbitMQ.

Please shed some light on this!

Unknown
  • 97
  • 1
  • 2
  • 8
Yugal Jindle
  • 44,057
  • 43
  • 129
  • 197
  • 12
    The impact of locking should be a lot less with PostgreSQL because it implements MVCC where readers are not blocked by writers and vice versa. Most of the articles I've found criticising the use of databases as message queues have MySQL in mind. – CadentOrange Jan 14 '14 at 14:02
  • A message broker moves data between nodes, while a database keeps data in one place. The fact that you can access data in a database from multiple nodes does not, on its face, make it a good tool to transfer data quickly between nodes. – theMayer Jun 16 '16 at 15:35
  • 2
    "scheduling system like `celery`" — I just learnt something which will be useful in my design, from the *question*. Now to read the answers... – Mark K Cowan Aug 15 '16 at 08:11
  • 1
    using message broker producer and consumer is decoupled. – giorgi dvalishvili Feb 20 '18 at 12:55
  • You can view bellow link. It has a wide description: https://stackoverflow.com/a/51377756/3073945 – Md. Sajedul Karim Jul 17 '18 at 09:40
  • @CadentOrange your comment can leave readers with the impression MySQL would perform worse than PostgreSQL, but that's rather misleading. In general, there should be no such problem (writers blocking readers) with MySQL either (talking about InnoDB, which has been the default storage engine as of 2010). Same goes for most other popular databases, such as MS SQL Server, Oracle DB, etc. Now, there are many details and differences between those RDBMSs that could have _some_ impact on the performance, but those can't be covered in a few sentences. – at54321 Jun 28 '23 at 20:30

2 Answers2

143

Rabbit's queues reside in memory and will therefore be much faster than implementing this in a database. A (good)dedicated message queue should also provide essential queuing related features such as throttling/flow control, and the ability to choose different routing algorithms, to name a couple(rabbit provides these and more). Depending on the size of your project, you may also want the message passing component separate from your database, so that if one component experiences heavy load, it need not hinder the other's operation.

As for the problems you mentioned:

  • polling keeping the database busy and low performing: Using Rabbitmq, producers can push updates to consumers which is far more performant than polling. Data is simply sent to the consumer when it needs to be, eliminating the need for wasteful checks.

  • locking of the table -> again low performing: There is no table to lock :P

  • millions of rows of task -> again polling is low performing: As mentioned above, Rabbitmq will operate faster as it resides RAM, and provides flow control. If needed, it can also use the disk to temporarily store messages if it runs out of RAM. After 2.0, Rabbit has significantly improved on its RAM usage. Clustering options are also available.

In regards to AMQP, I would say a really cool feature is the "exchange", and the ability for it to route to other exchanges. This gives you more flexibility and enables you to create a wide array of elaborate routing typologies which can come in very handy when scaling. For a good example, see:


(source: springsource.com)

and: http://blog.springsource.org/2011/04/01/routing-topologies-for-performance-and-scalability-with-rabbitmq/

Finally, in regards to Redis, yes, it can be used as a message broker, and can do well. However, Rabbitmq has more message queuing features than Redis, as rabbitmq was built from the ground up to be a full-featured enterprise-level dedicated message queue. Redis on the other hand was primarily created to be an in-memory key-value store(though it does much more than that now; its even referred to as a swiss army knife). Still, I've read/heard many people achieving good results with Redis for smaller sized projects, but haven't heard much about it in larger applications.

Here is an example of Redis being used in a long-polling chat implementation: http://eflorenzano.com/blog/2011/02/16/technology-behind-convore/

dotancohen
  • 30,064
  • 36
  • 138
  • 197
Jaigus
  • 1,422
  • 1
  • 16
  • 31
  • 3
    I've implemented a JMS implementation (i.e. a message passing system) on top of a database. I can tell you that it *is* possible, but it's not fun and it doesn't usually pay off to do it. Some of the problems you mention can be worked around, but it does increase the complexity quite a lot. All in all I agree: use a dedicated MQ system, if you need one. For low workloads, you can get away with having it in the DB, however. – Joachim Sauer Oct 22 '12 at 07:23
  • That's interesting. What about consistency by the way? What if there are hundreds of jobs on a queue and the node holding them in ram crashes? – Mahn Oct 22 '12 at 14:24
  • @Mahn There are quite a few options available by way of redundancy, and saving some data to disk: http://www.rabbitmq.com/ha.html I feel the best approach is to use a combination of the features offered. – Jaigus Oct 22 '12 at 15:53
  • 31
    Actually, with PostgreSQL there is no polling (see NOTIFY) nor are there table locks (see MVCC). Though PostgreSQL is still not designed for message queuing, it is not completely unsuitable. – jkj Oct 29 '12 at 18:06
  • 4
    Like what @jkj said, there's NOTIFY and no tables locks. The only issue seems like the high bandwidth of messages. Couldn't you have a dedicated PostgreSQL instance instead of maintaining an entirely new system like Rabbit? You can 1) use a single PostgreSQL instance until you reach a bottleneck, then 2) use a dedicated Postgres, then finally 3) easily switch to Rabbit as your broker. Seems like starting with Rabbit is pre-optimizing. – Joe Sep 16 '13 at 19:30
  • RabbitMQ support lot of protocol and it make it easy to interact between different kind of device: embedded system, server, mobile... (as you mention it provide lot of feature for messaging) – Alexandre Jan 23 '17 at 14:49
  • Just to add, some message brokers also allow messages to be transformed in some way before being passed on. That is, you can augment the message with additional information before passing it on from the broker to a client. This can be helpful with legacy or tricky-to-change apps working alongside current apps, in so much as the broker can make messages from them look the same to consumers. – Ralph Bolton Oct 18 '18 at 17:40
  • "Rabbit's queues reside in memory and will therefore be much faster than implementing this in a database." <- is this generally true, if messages stay in cache in the DB? – Tim Barrass Dec 20 '19 at 10:25
  • Postgresql is atomic right? So there is a invisible lock for concurrent write right? – Luk Aron Dec 26 '20 at 07:04
  • 1
    The link **http://eflorenzano.com/blog/2011/02/16/technology-behind-convore/** is 404 – Luk Aron Dec 26 '20 at 07:06
84

PostgreSQL 9.5

PostgreSQL 9.5 incorporates SELECT ... FOR UPDATE ... SKIP LOCKED. This makes implementing working queuing systems a lot simpler and easier. You may no longer require an external queueing system since it's now simple to fetch 'n' rows that no other session has locked, and keep them locked until you commit confirmation that the work is done. It even works with two-phase transactions for when external co-ordination is required.

External queueing systems remain useful, providing canned functionality, proven performance, integration with other systems, options for horizontal scaling and federation, etc. Nonetheless, for simple cases you don't really need them anymore.

Older versions

You don't need such tools, but using one may make life easier. Doing queueing in the database looks easy, but you'll discover in practice that high performance, reliable concurrent queuing is really hard to do right in a relational database.

That's why tools like PGQ exist.

You can get rid of polling in PostgreSQL by using LISTEN and NOTIFY, but that won't solve the problem of reliably handing out entries off the top of the queue to exactly one consumer while preserving highly concurrent operation and not blocking inserts. All the simple and obvious solutions you think will solve that problem actually don't in the real world, and tend to degenerate into less efficient versions of single-worker queue fetching.

If you don't need highly concurrent multi-worker queue fetches then using a single queue table in PostgreSQL is entirely reasonable.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778