I have read the related question:
But I'm still lost. We have two application servers and a database server (all are virtual machines provided by a cloud service). Today the database server just shut down completely without any warning. We managed to get the cloud service vendor to get it back up online and we restored our application to a working state again.
When questioned about the reason for this, the cloud service vendor returned with a bunch of TCP statistics (around 1500 lines) that look like this (masked for privacy):
ipv4 2 tcp 6 98 TIME_WAIT src=x.x.x.x dst=y.y.y.y sport=z dport=5432 packets=p bytes=b src=y.y.y.y dst=x.x.x.x sport=5432 dport=z packets=p bytes=b [ASSURED] mark=0 secmark=0 use=2
The vendor claims that the server had issues and shut itself down because of too many incoming connections, as evidenced by the high number of TIME_WAIT
connections.
However, there was no indication of the time frame in which the statistics were gathered. If they were gathered in a long time-range, the statistics can't be used to claim that there were a large number of such connections.
Such a claim can only be valid for a snapshot statistics done at a particular time-point (not a time-range), where it is evident that a large number of connections are in the TIME_WAIT
state at the given point in time. Am I right?
Even if we grant the possibility that there were indeed a large number of TIME_WAIT
connections at a snapshot time-point, is this damaging to the server and will it bring the server down to a grinding halt? Is this how a Denial of Service attack happens?