Play framework resource starvation after a few days

Question

I am experiencing an issue in Play 2.5.8 (Java) where database related service endpoints starts timing out after a few days even though the server CPU & memory usage seems fine. Endpoints that does not access the DB continue to work perfectly.

The application runs on a t2.medium EC2 instance with a t2.medium MySQL RDS, both in the same availability zone. Most HTTP calls do lookups/updates to the database with around 8-12 requests per second, and there are also ±800 WebSocket connections/actors with ±8 requests/second (90% of the WebSocket messages does not access the database). DB operations are mostly simple lookups & updates taking around 100ms.

When using only the default thread pool it took about 2 days to reach the deadlock, and after moving the database requests to a separate thread pool as per https://www.playframework.com/documentation/2.5.x/ThreadPools#highly-synchronous, it improved but only to about 4 days.

This is my current thread config in application.conf:

akka {
  actor {
    guardian-supervisor-strategy = "actors.RootSupervisionStrategy"
  }
  loggers = ["akka.event.Logging$DefaultLogger",
    "akka.event.slf4j.Slf4jLogger"]
  loglevel = WARNING

  ## This pool handles all HTTP & WebSocket requests
  default-dispatcher {
      executor = "thread-pool-executor"
      throughput = 1
      thread-pool-executor {
        fixed-pool-size = 64 
      }
  }

  db-dispatcher {
    type = Dispatcher
    executor = "thread-pool-executor"
    throughput = 1
    thread-pool-executor {
      fixed-pool-size = 210 
    }
  }
}

Database configuration:

play.db.pool="default"
play.db.prototype.hikaricp.maximumPoolSize=200
db.default.driver=com.mysql.jdbc.Driver

I have played around with the amount of connections in the DB pool & adjusting the size of the default & db-dispatcher pool size but it doesn't seem to make any difference. It feels I'm missing something fundamental about Play's thread pools & configuration as I don't think the load on the server should not be an issue for Play to handle.

have you tried attaching a debugger that could tell you what all your threads are doing? Can you reproduce the behavior on your local machine with simulating more requests (so that you don't have to wait for days) — rethab, Oct 08 '16 at 15:58

score 1 · Answer 1 · edited May 23 '17 at 12:00

1

After more investigation I found that the issue is not related to thread pool configuration at all, but rather TCP connections that build up due to WS reconnections until the server (or Play framework) cannot accept any more connections. When this happens, only established TCP connections are serviced which mostly includes the established WebSocket connections.

I could not yet determine why the connections are not managed/closed properly.

My issue relates to this question:

Play 2.5 WebSocket Connection Build

edited May 23 '17 at 12:00

Community

1
1

answered Oct 27 '16 at 07:15

mdw

21
3

1

How does this explain proper response from non db related endpoints? Obviously new connections are established when you do such calls. – Bruno Batarelo Nov 07 '18 at 12:43

Play framework resource starvation after a few days

1 Answers1

Linked