Consider following list of connections:
+----------+---------+------+------------------------+
| ID | COMMAND | TIME | STATE |
+----------+---------+------+------------------------+
| 87997796 | Sleep | 15 | cleaned up |
| 90850182 | Sleep | 105 | cleaned up |
| 88009697 | Sleep | 38 | delayed commit ok done |
| 88000267 | Sleep | 6 | delayed commit ok done |
| 88009819 | Sleep | 38 | delayed commit ok done |
| 90634882 | Sleep | 21 | cleaned up |
| 90634878 | Sleep | 21 | cleaned up |
| 90634884 | Sleep | 21 | cleaned up |
| 90634875 | Sleep | 21 | cleaned up |
+----------+---------+------+------------------------+
After some short time under minute:
+----------+---------+------+------------------------+
| ID | COMMAND | TIME | STATE |
+----------+---------+------+------------------------+
| 87997796 | Sleep | 9 | cleaned up |
| 88009697 | Sleep | 32 | delayed commit ok done |
| 88000267 | Sleep | 9 | delayed commit ok done |
| 88009819 | Sleep | 31 | delayed commit ok done |
| 90634882 | Sleep | 14 | cleaned up |
| 90634878 | Sleep | 14 | cleaned up |
| 90634884 | Sleep | 14 | cleaned up |
| 90634875 | Sleep | 14 | cleaned up |
+----------+---------+------+------------------------+
8 rows in set (0.02 sec)
enter code here
After I finished writing this stackoverflow post:
+----------+---------+------+------------------------+
| ID | COMMAND | TIME | STATE |
+----------+---------+------+------------------------+
| 87997796 | Sleep | 0 | cleaned up |
| 88009697 | Sleep | 53 | delayed commit ok done |
| 88000267 | Sleep | 0 | delayed commit ok done |
| 88009819 | Sleep | 52 | delayed commit ok done |
| 90634882 | Sleep | 5 | cleaned up |
| 90634878 | Sleep | 5 | cleaned up |
| 90634884 | Sleep | 5 | cleaned up |
| 90634875 | Sleep | 5 | cleaned up |
+----------+---------+------+------------------------+
Context:
This is some 3rd vendor app opening connections (source code isn't available to us, so we don't know details). We know that their connection management is awful , they know it as well. It is awful because connections leak which you can see in first table - 90850182. If others have their timers reset, then this one starts to age infinitely. In older versions of the app it would stay forever. In newer version it is eventually captured by a "patch" which vendor introduced , which effectively cleans connections after the x seconds you specify. So it's "a leak healing patch".
The problem:
We are hosting hundreds of such vendor apps and most of them have much more than 8 connections as they have more traffic. That results in disgusting number(talking thousands) of connections we have to maintain. About 80% of connections sit in "cleaned up" state and under 120 seconds (cleaned eventually by aforementioned configurable app parameter).
This is all handled by Aurora RDS and AWS engineers told us that if the app doesn't close properly connections the standard "wait_timeout" isn't going to work. Well, "wait_timeout" becomes useless decoration in AWS Aurora, but let us take it with Jeff in other thread/topic.
So regardless, we have this magic configurable parameter from third party vendor set on this obscure app which controls eviction of stale connections and it works.
The questions:
Is it safe to evict connections which are in "cleaned up" state immediately?
At the moment this happens after 120 seconds which results in huge number of such connections. Yet in the tables above you can see that the timers are reset meaning that something is happening to these connections and they are not entirely stale. I.e. connection pooling of the app "touches" them for further re-use?
I don't posses knowledge of connection pools inner guts as how they are seen from within database. Are all reserved connections of a connection pool by default are "sleeping" in "cleaned up" state?
So say if you start cleaning too much, you will fight connection pool aggressively creating more to replenish?
Or reserved connections have some different state?
Even if you don't fully understand the context I'd expect veteran DBA or connection pool library maintainer to help with such questions. Otherwise will get my hands dirty and answer this myself eventually, would try apache connection pool, hikari, observe them and try to kill their idle connections (simulating magic parameter) and try this 3rd party app connection with 0 seconds magic parameter, see if it still works.
Appreciate your time :bow:.