What is recommended Redisson configuration to avoid timeouts when connected to AWS Elasticache?

Question

We are using Redisson to connect to a replicated Redis on AWS elasticache with 1 master and 2 replica nodes.

The app makes uses of a number of RLocalCachedMaps, Locks and a few thousand Topics to track user state. (Topics and subscriptions coming and going as users go online and offline).

However we frequently get a series of RedisTimeoutExceptions, originally these were after the server had been running for several days and would occur continuously until either the server was restarted, or would crash with an out of memory error. Which led me to think it was a lack of subscriptions available, however our settings (below) should support over 100,000 subscriptions if I understand them correctly and we are not near that.
Further some of these will occur during warm up, where load on the server is relatively light, after a few exceptions the connections sort out and there are no major problems for several days, which indicates it is not a pure subscription problem. The commands are simple lock/publish/subscribe each time, rather than complex batches.

The load on the AWS Elasticache nodes is minor at all times, our server is deployed on an AWS EC2 instance so should have relatively good connectivity!

The 2 exceptions we get in quantity are either taking locks or subscribing to topics:

Caused by: org.redisson.client.RedisTimeoutException: Subscribe timeout: (7500ms)
at org.redisson.command.CommandAsyncService.syncSubscription(CommandAsyncService.java:142) ~[redisson-3.8.2.jar!/:na]
at org.redisson.RedissonLock.lockInterruptibly(RedissonLock.java:149) ~[redisson-3.8.2.jar!/:na]
at org.redisson.RedissonLock.lockInterruptibly(RedissonLock.java:136) ~[redisson-3.8.2.jar!/:na]
at org.redisson.RedissonLock.lock(RedissonLock.java:118) ~[redisson-3.8.2.jar!/:na]

and

java.util.concurrent.CompletionException: org.redisson.client.RedisTimeoutException
at org.redisson.misc.RedissonPromise.await(RedissonPromise.java:197) ~[redisson-3.8.2.jar!/:na]
at org.redisson.misc.RedissonPromise.await(RedissonPromise.java:206) ~[redisson-3.8.2.jar!/:na]
at org.redisson.command.CommandAsyncService.syncSubscription(CommandAsyncService.java:141) ~[redisson-3.8.2.jar!/:na]
at org.redisson.RedissonTopic.addListener(RedissonTopic.java:133) ~[redisson-3.8.2.jar!/:na]
at org.redisson.RedissonTopic.addListener(RedissonTopic.java:109) ~[redisson-3.8.2.jar!/:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_111]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_111]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111]
Caused by: org.redisson.client.RedisTimeoutException: null
at org.redisson.pubsub.PublishSubscribeService$4.run(PublishSubscribeService.java:220) ~[redisson-3.8.2.jar!/:na]
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:670) ~[netty-common-4.1.30.Final.jar!/:4.1.30.Final]
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:745) ~[netty-common-4.1.30.Final.jar!/:4.1.30.Final]
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:473) ~[netty-common-4.1.30.Final.jar!/:4.1.30.Final]

Our Configuration is:

"subscriptionConnectionMinimumIdleSize":32,
"subscriptionConnectionPoolSize":128,
"slaveConnectionMinimumIdleSize":32,
"slaveConnectionPoolSize":128,
"masterConnectionMinimumIdleSize":64,         
"masterConnectionPoolSize":128,
"subscriptionsPerConnection": 1000,
"timeout": 3000,
"retryAttempts": 3,
"retryInterval": 1500,
"readMode": "SLAVE",
"subscriptionMode": MASTER

I have read the Redisson FAQ on timeouts, our timeout exceptions are not obviously server or client, so unsure of which timeout parameter would be better to tweak, further given that they are 7.5 seconds, that is pretty long for user requests to be waiting. Similarly I can't find documentation on the recommended values for the connection pool sizes or subscriptions per connection and what would be sensible values for a production deployment.

No, Sorry Brad, as far as I know that issue was still ongoing when I left that company. — Draconas, May 04 '22 at 14:26

What is recommended Redisson configuration to avoid timeouts when connected to AWS Elasticache?

0 Answers0