I am experiencing an issue in Play 2.5.8 (Java) where database related service endpoints starts timing out after a few days even though the server CPU & memory usage seems fine. Endpoints that does not access the DB continue to work perfectly.
The application runs on a t2.medium EC2 instance with a t2.medium MySQL RDS, both in the same availability zone. Most HTTP calls do lookups/updates to the database with around 8-12 requests per second, and there are also ±800 WebSocket connections/actors with ±8 requests/second (90% of the WebSocket messages does not access the database). DB operations are mostly simple lookups & updates taking around 100ms.
When using only the default thread pool it took about 2 days to reach the deadlock, and after moving the database requests to a separate thread pool as per https://www.playframework.com/documentation/2.5.x/ThreadPools#highly-synchronous, it improved but only to about 4 days.
This is my current thread config in application.conf:
akka {
actor {
guardian-supervisor-strategy = "actors.RootSupervisionStrategy"
}
loggers = ["akka.event.Logging$DefaultLogger",
"akka.event.slf4j.Slf4jLogger"]
loglevel = WARNING
## This pool handles all HTTP & WebSocket requests
default-dispatcher {
executor = "thread-pool-executor"
throughput = 1
thread-pool-executor {
fixed-pool-size = 64
}
}
db-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
throughput = 1
thread-pool-executor {
fixed-pool-size = 210
}
}
}
Database configuration:
play.db.pool="default"
play.db.prototype.hikaricp.maximumPoolSize=200
db.default.driver=com.mysql.jdbc.Driver
I have played around with the amount of connections in the DB pool & adjusting the size of the default & db-dispatcher pool size but it doesn't seem to make any difference. It feels I'm missing something fundamental about Play's thread pools & configuration as I don't think the load on the server should not be an issue for Play to handle.