1

I have a spark scala job that I posted about earlier which had long running tasks here, I was able to figure out that the job is getting stuck because of the connection to Redis. I'm seeing notifications that my Redis cluster is maxing out on CPU in Elasticache and I'm not really sure what the right fix is. I'm using the Jedis client to connect and as of right now all my tasks just hang and never complete. I found that Redis is causing this bottleneck by commenting out the read/writes and my job completed.

I'm running this job on EMR with 24 spark.executor.instances and 12 spark.executor cores

The Redis configurations are:

RedisClient.port = 6379
RedisClient.poolConfig.setMaxIdle(300)
RedisClient.poolConfig.setMaxTotal(300)
RedisClient.poolConfig.setMaxWaitMillis(150000)

And I am reading from/writing to Redis in a loop which you can see in my previous post. This code has worked before on a smaller data set, so I think I have to adjust the settings though I'm not sure what I should adjust them to or if maybe I'm missing a setting. Whats the best way to figure out the optimal settings for a Redis Pool

sgallagher
  • 137
  • 10
  • 1
    Slow down your IO to Redis. Your Redis server can not handle it at this rate. – sarveshseri Jul 03 '21 at 10:20
  • What is the best way to slow down IO to Redis? Would it be through a setting? – sgallagher Jul 06 '21 at 16:11
  • 1
    You need to make sure that your clients stop making too many calls so that Redis server gets to breath. You are piling it with more work faster than it can handle. You need to either scale out your redis cluster or slow down your clients. Implement RateLimiters in your clients. – sarveshseri Jul 06 '21 at 16:33
  • Thank you, doing so did solve our issue – sgallagher Jul 07 '21 at 19:23

0 Answers0