0

On an legacy production application we were having an issue where the application crashed because it ran out of connections (the default was 100 connections) as a temporal solution we decided to increase the available connections to 500 but when the application reached 200 connections it just stopped itself, with no errors on the logs, just like a simple shut down.

I added a couple of logs that are generated each 15 secs for clearly seeing the behavior of the connections, these logs prints the idle connection and active connection as well as the full object of the Datasource properties. Before the application shut down the following logs where added:

Datasource idle connections: 0, active connections: 200

Datasource properties: org.apache.tomcat.jdbc.pool.DataSource@20b2475a{ConnectionPool[defaultAutoCommit=null; defaultReadOnly=null; defaultTransactionIsolation=-1; defaultCatalog=null; driverClassName=com.mysql.jdbc.Driver; maxActive=500; maxIdle=500; minIdle=10; initialSize=10; maxWait=30000; testOnBorrow=true; testOnReturn=false; timeBetweenEvictionRunsMillis=5000; numTestsPerEvictionRun=0; minEvictableIdleTimeMillis=60000; testWhileIdle=false; testOnConnect=false; password=********; url=jdbc:mysql://127.0.0.1:3306/db_name?createDatabaseIfNotExist=true; username=username; validationQuery=SELECT 1; validationQueryTimeout=-1; validatorClassName=null; validationInterval=3000; accessToUnderlyingConnectionAllowed=true; removeAbandoned=false; removeAbandonedTimeout=60; logAbandoned=false; connectionProperties=null; initSQL=null; jdbcInterceptors=null; jmxEnabled=true; fairQueue=true; useEquals=true; abandonWhenPercentageFull=0; maxAge=0; useLock=false; dataSource=null; dataSourceJNDI=null; suspectTimeout=0; alternateUsernameAllowed=false; commitOnReturn=false; rollbackOnReturn=false; useDisposableConnectionFacade=true; logValidationErrors=false; propagateInterruptState=false; ignoreExceptionOnPreLoad=false; }

After that the application shut itself down and I found following logs with no errors before:

2021-02-03 20:23:02.618  INFO 1 --- [       Thread-4] ationConfigEmbeddedWebApplicationContext : Closing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@8807e25: startup date [Wed Feb 03 19:49:09 GMT 2021]; root of context hierarchy
2021-02-03 20:23:02.623  INFO 1 --- [       Thread-4] o.s.c.support.DefaultLifecycleProcessor  : Stopping beans in phase 0
2021-02-03 20:23:02.643  INFO 1 --- [       Thread-4] o.s.j.e.a.AnnotationMBeanExporter        : Unregistering JMX-exposed beans on shutdown
2021-02-03 20:23:02.647  INFO 1 --- [       Thread-4] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default'

A couple of relevant dependencies and their versions:

org.springframework:spring-webmvc:jar:4.3.6.RELEASE:compile
org.springframework.boot:spring-boot-starter-data-jpa:jar:1.5.1.RELEASE:compile
    org.springframework.boot:spring-boot-starter-jdbc:jar:1.5.1.RELEASE:compile
        org.apache.tomcat:tomcat-jdbc:jar:8.5.11:compile
    org.hibernate:hibernate-core:jar:5.0.11.Final:compile
    org.springframework.data:spring-data-jpa:jar:1.11.0.RELEASE:compile
org.springframework.boot:spring-boot-starter-web:jar:1.5.1.RELEASE:compile
org.liquibase:liquibase-core:jar:3.5.1:compile
org.liquibase.ext:liquibase-hibernate5:jar:3.6:compile

Finally my ask is for help to understand why the application shuts itself down and how could I fix it so it is able to reach 500 connections?

Muro
  • 1
  • What are our configuration parameters? https://stackoverflow.com/questions/39002090/spring-boot-limit-on-number-of-connections-created regardless of that, 200 simultaneous connections seems excessive. Unless you have a multi cpu slot mother board with multi core cpu running the show, no cpu will handle those simultaneously and a lot of threads will spend time waiting for a cpu time slot, which might even slow down the application. – Tschallacka Feb 08 '21 at 14:27
  • Run multiple instances of the app (each with 100 connections). If you dockerize it that should be pretty simple to do (you can run 10 instances or more with minimal effort). – The Impaler Feb 08 '21 at 14:44
  • @Tschallacka thats a good observation, we are running this service in google cloud on a kubernetes cluster and I've been told by our DevOps team that "CPU usage on the pod is high before the service stops responding to health checks, but it is still well below the CPU we have assigned to each pod. Memory also looks good, well below the limit." The service stops responding to health checks because it shuts down, so with this information I would guess is not an issue of resources. Regarding the configuration parameters, could you clarify what configurations should I post? – Muro Feb 08 '21 at 14:58
  • @TheImpaler I know this is the optimal solution yet the business don't want to implement auto scaling just yet, they just want to have 2 instances with 500 connections each :/ – Muro Feb 08 '21 at 14:59
  • @Muro cpu usage is high before not responding to health checks. That does **not** suprise my at all. If you have 200 connections, presumably assigned to their own thread pool, and your app has to switch threads constantly, sync up with the datasource to get a connection, decouple, do it's stuff, re-sync with thread pool, reset connection, the thread scheduler is running overtime, before maxing out, where the standard protections apply, the program is not responding, cpu is not responding, kill all processes and reset. **Always** keep hardware limitations in your mind, and know what they are. – Tschallacka Feb 08 '21 at 15:06
  • @muro Ask yourself, how many physicial and virtual cores(hyper threading) are available to our instance. Then ask yourself, how many threads do we have in our program, if the amount of constantly under load threads in your program is higher than the amount of cores, you're slowing your application down. Do you have short lived threads like network requests that have a beginning and end? Or constantly running threads? Do you have a thread jobs queue? All important information. Your hardware is chocking out, so make sure you know what you have, and what stress it causes/can handle. – Tschallacka Feb 08 '21 at 15:10
  • 1
    That makes a lot of sense to me, I will discuss that with our DevOps team to find more about the underlying resources. Thanks – Muro Feb 08 '21 at 15:19

0 Answers0