I have a 3-node cluster running on GKE. All the nodes are pre-emptible meaning they can be killed at any time and generally do not live longer than 24 hours. In the event a node is killed the autoscaler spins up a new node to replace it. This usually takes a minute or so when this happens.
In my cluster I have a deployment with its replicas set to 3. My intention is that each pod will be spread across all the nodes such that my application will still run as long as at least one node in my cluster is alive.
I've used the following affinity configuration such that pods prefer running on hosts different to ones already running pods for that deployment:
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- my-app
topologyKey: kubernetes.io/hostname
weight: 100
When I scale my application from 0 this seems to work as intended. But in practice the following happens:
- Lets say pods belonging to the
my-app
replicasetA
,B
andC
are running on nodes1
,2
and3
respectively. So state would be:
1 -> A
2 -> B
3 -> C
- Node 3 is killed taking pod C with it, resulting in 2 running pods in the replicaset.
- The scheduler automatically starts to schedule a new pod to bring the replicaset back up to 3.
- It looks for a node without any pods for
my-app
. As the autoscalar is still in the process of starting a replacement node (4
), only1
and2
are available. - It schedules the new pod
D
on node1
- Node
4
eventually comes online but asmy-app
has all its pods scheduled it doesn't have any pods running on it. Resultant state is
1 -> A, D
2 -> B
4 -> -
This is not the ideal configuration. The problem arises because there's a delay creating the new node and the schedular is not aware that it'll be available very soon.
Is there a better configuration that can ensure the pods will always be distributed across the node? I was thinking a directive like preferredDuringSchedulingpreferredDuringExecution
might do it but that doesn't exist.