1

I am using haproxy to loadbalance my MQTT brokers cluster. Each MQTT Broker can handle up to 1,00,000 Connections easily. But the problem i am facing with haproxy is that is only handling upto30k connections per node. Whenever if any node is reaching near 32k connections, the haproxy CPU Would suddenly spike to 100% and now all connections start dropping.

The problem with this is, that for every 30k connection, i have to roll another MQTT broker. How can I increase it to at least 60k connections per MQTT broker node?

My Virtual Machine: 1 Cpu, 2 GB Ram. I have tried increasing CPU counts, and faced the same problem.

My config –

bind 0.0.0.0:1883
maxconn 1000000
mode tcp

#sticky session load balancing – new feature

stick-table type string len 32 size 200k expire 30m
stick on req.payload(0,0),mqtt_field_value(connect,client_identifier)
option clitcpka # For TCP keep-alive
option tcplog

timeout client 600s
timeout server 2h
timeout check 5000

server mqtt1 10.20.236.140:1883 check-send-proxy send-proxy-v2 check inter 10s fall 2 rise 5
server mqtt2 10.20.236.142:1883 check-send-proxy send-proxy-v2 check inter 10s fall 2 rise 5
server mqtt3 10.20.236.143:1883 check-send-proxy send-proxy-v2 check inter 10s fall 2 rise 5


I have also tuned system params

sysctl -w net.core.somaxconn=60000
sysctl -w net.ipv4.tcp_max_syn_backlog=16384
sysctl -w net.core.netdev_max_backlog=16384

sysctl -w net.ipv4.ip_local_port_range='1024 65535'

sysctl -w net.ipv4.tcp_rmem='1024 4096 16777216'
sysctl -w net.ipv4.tcp_wmem='1024 4096 16777216'

modprobe ip_conntrack
sysctl -w net.nf_conntrack_max=1000000
sysctl -w net.netfilter.nf_conntrack_max=1000000
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=30
sysctl -w net.ipv4.tcp_max_tw_buckets=1048576
sysctl -w net.ipv4.tcp_fin_timeout=15

tee -a /etc/security/limits.conf << EOF
root      soft   nofile      1048576
root      hard   nofile      1048576
haproxy    soft   nproc      1048576
haproxy    hard   nproc      1048576
EOF

Output of Haproxy haproxy -v

HAProxy version 2.4.18-1ppa1~focal 2022/07/27 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2026.
Known bugs: http://www.haproxy.org/bugs/bugs-2.4.18.html
Running on: Linux 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -O2 -fdebug-prefix-map=/build/haproxy-96Se88/haproxy-2.4.18=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_SYSTEMD=1 USE_PROMEX=1
  DEBUG   = 

Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -CLOSEFROM -ZLIB +SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT -QUIC +PROMEX -MEMORY_PROFILING

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=1).
Built with OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
Running on OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.34 2019-11-21
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 9.4.0

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTTP       side=FE|BE     mux=H2       flags=HTX|CLEAN_ABRT|HOL_RISK|NO_UPG
            fcgi : mode=HTTP       side=BE        mux=FCGI     flags=HTX|HOL_RISK|NO_UPG
       <default> : mode=HTTP       side=FE|BE     mux=H1       flags=HTX
              h1 : mode=HTTP       side=FE|BE     mux=H1       flags=HTX|NO_UPG
       <default> : mode=TCP        side=FE|BE     mux=PASS     flags=
            none : mode=TCP        side=FE|BE     mux=PASS     flags=NO_UPG

Available services : prometheus-exporter
Available filters :
        [SPOE] spoe
        [CACHE] cache
        [FCGI] fcgi-app
        [COMP] compression
        [TRACE] trace

Daga Arihant
  • 464
  • 4
  • 19

1 Answers1

0

I don't see anything in your config that should account for why you can't achieve more than 30K connections without the CPU spiking. I am not sure those tuned system parameters are doing you any good.

For reference, I am successfully running HAProxy (vanilla docker image haproxy:2.4.17) with 2 CPU and 4G memory and can reach my maxconn of 150K without pegging the CPU.)

Outside of the per instance scaling problem you are having, it looks like another problem you may have is you are assuming 1 HAProxy to 1 MQTT broker.

The problem with this is, that for every 30k connection, i have to roll another MQTT broker.

In my experience it is better to run multiple HAProxy nodes in front of a single broker. (Not sure what your constraints are here.) Having more than one instance of HAProxy per backend is critical. When you lose an instance, you will only lose a fraction of your traffic. (When, not if.) This is the critical piece here, because when the traffic gets dropped the clients will all try to re-connect at the same time. That is how you accidentally DDoS yourself because of a lost virtual machine or pod.

At your current vertical scale (CPU and memory) you can get to 30K connections. That would be 34 nodes if you set maxconn 30K. That should be easily doable if you run them as Kubernetes pods as a deployment with something like ELB in front of your HAProxy cluster. One way or another, you need to figure out how to run HAProxy as a cluster without duplicating your backends.

A good rule of thumb is "scale out when scaling up no longer works".