Docker Swarm and Kubernetes are two systems to manage applications on several nodes. If a node is draining or its load is very high they start some procedure to maintain the desired state described in the requirements.
Of course, when they manage the application over the infrastructure, they have to take some choices in order to modify the state. How the decisions are taken in order to ensure that the decisions don't damage the system?
I mean, not when the actions are triggered (system analysis) but how can we prove that the decision taken are the best decision to resolve the problem? There is some documentation on that? I don't find any referring to this topic.
For example: I have a node which resources are almost free. Then, at a certain moment, their resources became insufficient, and remain insufficient just for a second, then return free. If the manager migrates applications from that node to another one because of that second of insufficient resources has trigged the migration functions, probably it will create more problem than solved ones since the insufficient resources problem are already passed and no migration was really required.