I want to scale my deployment depending on the amount of requests. Each pod can only handle a request at a time. Scaling up is no problem, but when I want to scale down I want to make sure I am not killing a pod that is working right now ( e.g. encoding a large file).
I have the folling pods:
- Pod 1 (created 10 min ago, has a task)
- Pod 2 (created 5 min ago, is free)
- Pod 3 (created 1 min ago, has a task)
If I reduce the replica value, kubernetes will kill pod 3. It does not care if the pod is busy or not. I could manually kill pod 2, so kubernetes would start a new one:
- Pod 1 (created 10 min ago, has a task)
- Pod 3 (created 1 min ago, has a task)
- Pod 4 (created just now, is free)
After I know pod 2 got killed I could reduce the number of the counter, so pod 4 will be killed before getting a task. But this solution sounds very ugly, because someone else has to tell pod 2 to shut down.
So kubernetes will kill the last created ones, but is it possible to tell him, that a pod is busy and he has to wait before it will be killed?