Load balancing sockets on a horizontally scaling WebSocket server?

Question

Every few months when thinking through a personal project that involves sockets I find myself having the question of "How would you properly load balance sockets on a dynamic horizontally scaling WebSocket server?"

I understand the theory behind horizontally scaling the WebSockets and using pub/sub models to get data to the right server that holds the socket connection for a specific user. I think I understand ways to effectively identify the server with the fewest current socket connections that I would want to route a new socket connection too. What I don't understand is how to effectively route new socket connections to the server you've picked with low socket count.

I don't imagine this answer would be tied to a specific server implementation, but rather could be applied to most servers. I could easily see myself implementing this with vert.x, node.js, or even perfect.

**Sidenote**: "how to effectively route new socket connections to the server you've picked with low socket count" - this isn't the only metric... maybe one server has a lot of lazy clients while another server has many active clients - the number of clients isn't really the only valid test. Also, what happens when a lazy client becomes super active? ... it's all impossible to guess. I'm just as interested in discovering the answer, but I would guess round-robbin while adjusting for last response time testings (for a known pre-set query) would be a decent heuristic to apply. — Myst, Nov 16 '17 at 04:13
If you are interested in the answer, upvote the question :-) Round Robin is exactly what my question is trying to avoid. If your servers go under load, so a new server spins up, round robin would do a pretty bad job of filling out sockets on that new server. Also, once you can route sockets to a specific server, you can have the client reconnect and redistribute the load. — spierce7, Nov 16 '17 at 04:53

jfriend00 · Accepted Answer · 2017-11-16T05:12:09.640

First off, you need to define the bounds of the problem you're asking about. If you're truly talking about dynamic horizontal scaling where you spin up and down servers based on total load, then that's an even more involved problem than just figuring out where to route the latest incoming new socket connection.

To solve that problem, you have to have a way of "moving" a socket from one host to another so you can clear connections from a host that you want to spin down (I'm assuming here that true dynamic scaling goes both up and down). The usual way I've seen that done is by engaging a cooperating client where you tell the client to reconnect and when it reconnects it is load balanced onto a different server so you can clear off the one you wanted to spin down. If your client has auto-reconnect logic already (like socket.io does), you can just have the server close the connection and the client will automatically re-connect.

As for load balancing the incoming client connections, you have to decide what load metric you want to use. Ultimately, you need a score for each server process that tells you how "busy" you think it is so you can put new connections on the least busy server. A rudimentary score would just be number of current connections. If you have large numbers of connections per server process (tens of thousands) and there's no particular reason in your app that some might be lots more busy than others, then the law of large numbers probably averages out the load so you could get away with just how many connections each server has. If the use of connections is not that fair or even, then you may have to also factor in some sort of time moving average of the CPU load along with the total number of connections.

If you're going to load balance across multiple physical servers, then you will need a load balancer or proxy service that everyone connects to initially and that proxy can look at the metrics for all currently running servers in the pool and assign the connection to the one with the most lowest current score. That can either be done with a proxy scheme or (more scalable) via a redirect so the proxy gets out of the way after the initial assignment.

You could then also have a process that regularly examines your load score (however you decided to calculate it) on all the servers in the cluster and decides when to spin a new server up or when to spin one down or when things are too far out of balance on a given server and that server needs to be told to kick several connections off, forcing them to rebalance.

What I don't understand is how to effectively route new socket connections to the server you've picked with low socket count.

As described above, you either use a proxy scheme or a redirect scheme. At a slightly higher cost at connection time, I favor the redirect scheme because it's more scalable when running and creates fewer points of failure for an existing connection. All clients connect to your incoming connection gateway server which is responsible for knowing the current load score for each of the servers in the farm and based on that, it assigns an incoming connection to the host with the lowest score and this new connection is then redirected to reconnect to one of the specific servers in your farm.

I have also seen load balancing done purely by a custom DNS implementation. Client requests IP address for farm.somedomain.com and that custom DNS server gives them the IP address of the host it wants them assigned to. Each client that looks up the IP address for farm.somedomain.com may get a different IP address. You spin hosts up or down by adding or removing them from the custom DNS server and it is that custom DNS server that has to contain the logic for knowing the load balancing logic and the current load scores of all the running hosts.

I really appreciate your answer. It makes sense, but it doesn't really answer my question. I still don't understand how to have a client socket connection routed to a specific server. I don't understand how to route a socket connection request. It's possible the question I'm asking is too basic that you aren't expecting it. If I get a socket connection request and then using something like hazelcast or redis to identify the server with the lowest load, how do I get the request to that server and have the client create a socket directly to that server? — spierce7, Nov 16 '17 at 05:13
@spierce7 - You either use a proxy model or a redirect model. In the proxy model, it works like NGINX does load balancing. Client connects to the proxy and the proxy then connect to the approrriate host and serves as the middleman forwarding packets both ways. In this situation, I would prefer the redirect model. Client connects to load balancer and it redirected to a new IP address to which they make a new connection. — jfriend00, Nov 16 '17 at 05:16
@spierce7 - The redirect can be done either with a 303 or 307 on the initial webSocket connection or it can be done at the application level where the webSocket connection initiates to the load balancer and then the load balancer sends the client an application message over the webSocket which tells them to reconnect to a new host. I think there is some question whether all webSocket clients support the 3xx redirect so it might be more reliable to do at the application level. — jfriend00, Nov 16 '17 at 05:19
So you are saying with a proper redirect, the client would make a connection directly to the selected server? I'd assumed the socket would always go through the load balancer with that approach. — spierce7, Nov 16 '17 at 05:54
How would the 307 redirect work? Would that tell the client to make another request to the resource, or would the server be properly routing the request to an internal server? Do you have any links to an example of such an approach? — spierce7, Nov 16 '17 at 05:57
@spierce7 - The 307 redirect would return a new host name for the client to connect to. With the redirect scheme, the load balancer gets out of the way after initial load balancing (which I think is a good thing for both scalability and reliability). — jfriend00, Nov 16 '17 at 06:15
I think I understand now. So all of the socket servers would need to be directly accessible via a request? — spierce7, Nov 16 '17 at 06:16
@spierce7 - Yes, for the redirect model. Not for the proxy model. — jfriend00, Nov 16 '17 at 06:17
I have the full picture in my head now. Thanks for the back and fourth! Just to make sure I understand the proxy approach, with the proxy approach, the end result would be the client has a connection to the proxy, and the proxy has a connection to the internal server? The server sends the data to the proxy, and then the proxy to the client? This is exactly the approach I've been trying to avoid. — spierce7, Nov 16 '17 at 06:22
@spierce7 - Yes, that's the proxy model. The advantage of the proxy model is that it can be invisible to the client and the actual servers do not have to be directly reachable, but it has scale and failover issues for long lasting connections and you ultimately have to figure out how to cluster the proxy too. — jfriend00, Nov 16 '17 at 06:24
Yes, that's why I wanted to avoid it :-) Thanks again for the back and fourth! — spierce7, Nov 16 '17 at 06:25

score 5 · Answer 2 · answered Nov 16 '17 at 06:24

Route the websocket requests to a load balancer that makes the decision about where to send the connections.

As an example, HAProxy has a leastconn method for long connections that picks the least recently used server with the lowest connection count.

The HAProxy backend server weightings can also be modified by external inputs, @jfriend00 detailed the technicalities of weighting in their answer.

score 0 · Answer 3 · answered Nov 17 '20 at 00:01

I found this project that might be useful: https://github.com/apundir/wsbalancer

A snippet from the description:

Websocket balancer is a stateful reverse proxy for websockets. It distributes incoming websockets across multiple available backends. In addition to load balancing, the balancer also takes care of transparently switching from one backend to another in case of mid session abnormal failure. During this failover, the remote client connection is retained as-is thus remote client do not even see this failover. Every attempt is made to ensure none of the message is dropped during this failover.

Regarding your question : that new connection will be routed by the load balancer if configured to do so.

As @Matt mentioned, for example with HAProxy using the leastconn option.

Wouldn't this suffer from a scaling problem since it's proxying all web sockets? — spierce7, Nov 17 '20 at 00:51
There is a 64k client port limit per IP, I guess this balancer setup will not scale beyond 32k, as it consumes 2 websockets per inbound connection. — Ashwin Prabhu, Aug 08 '22 at 12:38

Load balancing sockets on a horizontally scaling WebSocket server?

3 Answers3