Rabbitmq cluster crashing when creating queues

Question

Hello I have a question that affter looking around for about 2 days I was not able to solve, so I will write it here, as clear as possible so it may help others too.

The scenario is:

We have an application that will handle about 200k devices thought amqp protocol using a Rabbitmq cluster.
We thought of having 1 Exchange with 200k queues with around 6 "routing key" each for the devices.
These queues needs to be durable and lazy, as we don't want to loose any message.
We are using mirror nodes as we need HA.

The test:

I created a cluster with 5 nodes, and replication 2

    "definition": {
            "ha-mode": "exactly",
            "ha-params": 2,
            "ha-sync-mode": "automatic",
            "ha-sync-batch-size": 1
          }

I created 50k durable, lazy, queues with the routing keys also.

def create_one_queue(queue_name, threadName, channel):
    channel.queue_declare(queue=queue_name, durable=True, arguments={'x-queue-mode': 'lazy'})
    for bind in BINDINGS:
        channel.queue_bind(exchange=EXCHANGE, queue=queue_name, routing_key=bind.format(queue_name))
    print("[{}]Created Queue {}".format(threadName, queue_name))

def create_queues(threadName, base):
    channel = get_channel()
    for i in range(0, 1000):
        try:
            queue_name = str(i + base)
            create_one_queue(queue_name, threadName, channel)
        except Exception as e:
            print(e)

3. When I tried to keep growing and arrive to 200k nodes start to crash without running out of resources.

Links

I already took a lok to the followings posts:

https://www.rabbitmq.com/ha.html#ways-to-configure

https://www.cloudamqp.com/blog/2018-01-09-part3-rabbitmq-best-practice-for-high-availability.html

RabbitMQ - How many queues RabbitMQ can handle on a single server?

https://serverfault.com/questions/378165/rabbitmq-reasonable-performance-scale-expectations

http://rabbitmq.1065348.n5.nabble.com/How-many-queues-can-one-broker-support-td21539.html

https://www.quora.com/RabbitMQ/Can-rabbitMQ-or-zeroMQ-handle-1mil-queues

but I see contradictions (cloudamqp suggest to use few queues, but in other places saids you may arrive to 1M queues)

Questions

How is possible the cluster start to crash if I am not getting out of resources?
Is my approach wrong?
Any advice to improve my cluster configuration?

Thanks a lot

You should ask such questions on the rabbitmq-users Google group; the RabbitMQ engineers don't monitor Stack Overflow closely. — Gary Russell, May 05 '20 at 13:22
Thanks @GaryRussell for the tip, already did. I see you are a experienced with Rabbitmq, have you ever work or seen a cluster with this amount of queues? I need at least to know is possible. Thanks — Pato Navarro, May 05 '20 at 14:11

Pato Navarro · Accepted Answer · 2020-05-07T12:08:12.363

Ok I will answer my question with the results of my findings so far:

1) As I was usign Kubernetes and Helm to deploy the cluster, I was putting to much memory pressure in the pods, leaving no free space for garbage collector. https://www.rabbitmq.com/memory-use.html#queue-memory-usage-gc

High memory watermark blocks publishers and prevents new messages from being enqueued. Since garbage collection can double the memory used by a queue, it is unsafe to set the high memory watermark above 0.5. The default high memory watermark is set to 0.4 since this is safer as not all memory is used by queues. This is entirely workload specific, which differs across RabbitMQ deployments.

2) Seems ok.

3) in order to create 200k durable and lazy queues, I had to use a cluster of 10 nodes each one with 8 vCPU and 30 GB RAM.

note: I will keep this answer up to date as I tune my cluster.

what do you mean with "I was putting too much pressure"? Like, by staring the queue? — suren, May 07 '20 at 09:52
Already updated the response. I meant that I was using practically all the pod memory I set the value "rabbitmqMemoryHighWatermark" to 0.5 of the pod memory and start to work better (until remain without memory) — Pato Navarro, May 07 '20 at 12:10

Rabbitmq cluster crashing when creating queues

The scenario is:

The test:

Links

Questions

1 Answers1