RSocket + webflux request-response resilience

Question

Given the following code (using Spring Webflux and RSocket):

@MessageMapping("hello.{name}")
public Mono<String> greet(@DestinationVariable String name) {
    return Mono.just("Hello " + name); // or assume this is making a slow http call
}

The questions are:

When the server is under heavy load, will the client just send the request anyway to server and the server will buffer that request? Or is there actually some kind of mechanism that make the client wait until the server send the signal to client?
If the client send the request anyway, then at some point the server will be out of memory to buffer all the excess requests. How do we usually handle it? Can netifi broker helps in this situation? (Assume it's burst traffic and we can't scale-out or scale-up the servers in-time)

Oleh Dokuka · Accepted Answer · 2020-09-27T10:53:36.540

Real Resilience with RSocket

RSocket, as a network protocol, has resilience as the first-class citizen. In RSocket, the Resilience property is exposed in 2 ways:

Resilience via flow-control (a.k.a Backpressure)

If you do streaming, your subscriber can control the number of elements being delivered, so your subscriber will not be overwhelmed by a server. The animation below shows how reactive-streams spec is implemented on the RSocket protocol level:

As it might be noticed, as in the Reactive Streams, the Subscriber (lefthand side) requests data via its Subscription, this request is transformed into a binary frame, sent via the network, and once the receiver receives that frame, it decodes it and then delivers to the corresponding subscription on the remote site so that the remote Publisher can produce the exact number of messages.

Resilience via Leasing

On the other hand, along with streaming, the server, which usually manages multiple connections, has to stand toward the load, and in case of failure, it should be able to prevent any further interactions. For that purpose, RSocket brings a built-in protocol feature, called Leasing. In a nutshell, Leasing is a built-in into protocol Rate Limiting, where the request limit is something dynamic and absolutely controlled by the Responder side.

There are a few phrases that might be distinguished in that process:

Setup phase - this phase happens when a Client connects to a server, and both of the sides have to provide particular flags to agree that both of them are ready to respect Leasing.
Silence phase - at that phase, a Requester can not do anything. There is a strict relationship - Request is not allowed to do anything unless the Responder allows doing so. If a Requester tries to send any requests, such requests will fail immediately without sending any frame to the remote.
Lease provisioning phase - once a Responder agreed on its capacity and ready to receive requests from the Requester, it sends a specific frame called Lease. That frame contains 2 crucial values: Number of Requests and Time to Live. The first value says to a Requester the number of requests it can send to the Responder. The second value says how long such an allowance is valid. So if a Requester has not used all of them by the time, such allowance will be treated as invalid, and any further requests will be rejected on the Requester sied.

This interaction is depicted in the following animation:

Note

Lease strategy works on a per connection basis, which means that if you issue a lease, you issue it for a single particular remote Requester, and not for all requesters connected to your server. On the other hand, math can be applied to share the whole server capacity between all connected Requesters depends on some metrics, etc.

Where to find an example of both

There are a couple of good samples that demonstrate how flow-control and leasing can be used with RSocket. All of them can be found at the official git repo of the RSocket-Java project here

For leasing (number of request and time to leave), is it manually configured on the application level? Does the lease shared for all requesters? — Riko Nagatama, Sep 27 '20 at 10:45
The leasing strategy can be configured separately on both sides (on the Client and Server since the Protocol is Peer to Peer, thus the Responder logic can be implemented here and there). For the server, the lease strategy is created per connection, thus the mechanism which sends leases is per connection. That said, if you have provided 100 of requests allowed to be send for connection A, and you have only 150 in total, then you may do math and issue for example only 50 or 25 to an other — Oleh Dokuka, Sep 27 '20 at 10:50
btw I think it's quite hard to calculate the number of requests in leasing. And it's even more complicated when combined with time to live. Do you think it's a good practice to calculate per request basis (also can we do that, maybe with 0 TTL)? Will it significantly slows down the communication because we need to do all the leasing phases for every request? — Riko Nagatama, Sep 27 '20 at 13:08
This can be done easily with the usage of Netflix concurrency-limit lib. We have a PR in progress which changes some internals and adds a number of examples showing how to calculate lease when there are multiple connections https://github.com/rsocket/rsocket-java/pull/885 — Oleh Dokuka, Sep 27 '20 at 13:32
Also, we are working on the new Lease changes, but they are not yet there -> https://github.com/rsocket/rsocket/pull/311 — Oleh Dokuka, Sep 27 '20 at 13:33
Finally, TTL is important, since if your server stacked unexpectedly, unbounded TTL will disable a client to stop requesting (TTL is unbounded hence if a significant number of leases were issued, nothing prevents a client to continue using them). Please note, Unbounded TTL == Integer.MAX_VALUE. 0 is an invalid TTL value. — Oleh Dokuka, Sep 27 '20 at 13:36
Also, you don't have to do lease re-calculation for every request. You may run an interval function (duration of interval == TTL specified for lease) and reissue lease on every tick regardless of whether all the request's allowance was used or not — Oleh Dokuka, Sep 27 '20 at 13:39
I see, looking forward to see the example on how to do the lease calculation. Would be very appreciated if there's example on how to use netflix concurrency limit with the lease calculation too. — Riko Nagatama, Sep 27 '20 at 14:08
I have sent a link in the previous comments, following that link you will find a PR. if you look at the code, you will find netflix concurrency limit example — Oleh Dokuka, Sep 27 '20 at 14:16
Also, please vote if this PR is important, so our team will review the priority for those changes — Oleh Dokuka, Sep 27 '20 at 14:44