14

I have this ansible (working) playbook that looks at the output of kubectl get pods -o json until the pod is in the Running state. Now I want to extend this to multiple pods. The core issue is that the json result of the kubectl query is a list, I know how to access the first item, but not all of the items...

- name: wait for pods to come up
  shell: kubectl get pods -o json
  register: kubectl_get_pods
  until: kubectl_get_pods.stdout|from_json|json_query('items[0].status.phase') == "Running"
  retries: 20

The json object looks like,

[  { ...  "status": { "phase": "Running" } },
   { ...  "status": { "phase": "Running" } },
   ...
]

Using [0] to access the first item worked for handling one object in the list, but I can't figure out how to extend it to multiple items. I tried [*] which did not work.

Martin Guthrie
  • 143
  • 1
  • 1
  • 6
  • Have you tired loops ? https://docs.ansible.com/ansible/2.7/user_guide/playbooks_loops.html –  Nov 07 '18 at 22:19

7 Answers7

22

The kubectl wait command

Kubernetes introduced the kubectl wait in v1.11 version:

CHANGELOG-1.11:

  • kubectl wait is a new command that allows waiting for one or more resources to be deleted or to reach a specific condition. It adds a kubectl wait --for=[delete|condition=condition-name] resource/string command.

CHANGELOG-1.13:

  • kubectl wait now supports condition value checks other than true using --for condition=available=false

CHANGELOG-1.14:

  • Expanded kubectl wait to work with more types of selectors.
  • kubectl wait command now supports the --all flag to select all resources in the namespace of the specified resource types.

It is not intended to wait for phases, but for conditions. I think that waiting for conditions is much more assertive than waiting for phases. See the following conditions:

  • PodScheduled: the Pod has been scheduled to a node;
  • Ready: the Pod is able to serve requests and should be added to the load balancing pools of all matching Services;
  • Initialized: all init containers have started successfully;
  • ContainersReady: all containers in the Pod are ready.

Using kubectl wait with Ansible

Suppose that you are automating a Kubernetes install with kubeadm + Ansible, and need to wait for the installation to complete:

- name: Wait for all control-plane pods become created
  shell: "kubectl get po --namespace=kube-system --selector tier=control-plane --output=jsonpath='{.items[*].metadata.name}'"
  register: control_plane_pods_created
  until: item in control_plane_pods_created.stdout
  retries: 10
  delay: 30
  with_items:
    - etcd
    - kube-apiserver
    - kube-controller-manager
    - kube-scheduler

- name: Wait for control-plane pods become ready
  shell: "kubectl wait --namespace=kube-system --for=condition=Ready pods --selector tier=control-plane --timeout=600s"
  register: control_plane_pods_ready

- debug: var=control_plane_pods_ready.stdout_lines

Result Example:

TASK [Wait for all control-plane pods become created] ******************************
FAILED - RETRYING: Wait all control-plane pods become created (10 retries left).
FAILED - RETRYING: Wait all control-plane pods become created (9 retries left).
FAILED - RETRYING: Wait all control-plane pods become created (8 retries left).
changed: [localhost -> localhost] => (item=etcd)
changed: [localhost -> localhost] => (item=kube-apiserver)
changed: [localhost -> localhost] => (item=kube-controller-manager)
changed: [localhost -> localhost] => (item=kube-scheduler)

TASK [Wait for control-plane pods become ready] ********************************
changed: [localhost -> localhost]

TASK [debug] *******************************************************************
ok: [localhost] => {
    "control_plane_pods_ready.stdout_lines": [
        "pod/etcd-localhost.localdomain condition met", 
        "pod/kube-apiserver-localhost.localdomain condition met", 
        "pod/kube-controller-manager-localhost.localdomain condition met", 
        "pod/kube-scheduler-localhost.localdomain condition met"
    ]    
}
Eduardo Baitello
  • 10,469
  • 7
  • 46
  • 74
  • 3
    This is one of the most comprehensive answers I've read in a while, covers all the background, links to relevant docs, has very precise examples... just wanted to say thanks, and works a treat. – geerlingguy Dec 18 '19 at 16:59
  • @geerlingguy it's awesome to hear it from you! I use some of your Ansible roles/playbook from Github and a lot of what I learned on Ansible was inspired by them. I owe many thanks to you too! – Eduardo Baitello Dec 18 '19 at 17:43
  • 1
    Thanks! It helps alot. But there is one concern. In the first step it checks if the pod items with names given in the list is created or not. In this case, we have to provide a list with the exact names of pods. What to do in installation scenarios where the pod names contain random strings like metrics-server-7566d596c8-4zzrk ? In my case, instead of checking pod state, i set the condition to verify deployments are in 'Available' state. Because i can provide exact names for deployments or services. – AnjK Sep 17 '20 at 06:33
12

I would try something like this (works for me):

tasks:
- name: wait for pods to come up
  shell: kubectl get pods -o json
  register: kubectl_get_pods
  until: kubectl_get_pods.stdout|from_json|json_query('items[*].status.phase')|unique == ["Running"]

You are basically getting all the statuses for all the pods and combining them into a unique list, and then it won't complete until that list is ["Running"]. So for example, if all your pods are not running you will get something like ["Running", "Starting"].

Rico
  • 58,485
  • 12
  • 111
  • 141
4

The community.kubernetes.k8s plugin for Ansible has a built in wait functionality !

However the problem with this is that different resources have different wait_condition types. If you are using a deployment then as seen below type: Complete works well as long as you set the correct timeout bounds, but if you have different resource types in the yaml like serviceaccounts it will most likely hang.

- name: Deploy the stack
  community.kubernetes.k8s:
    state: present
    src: "{{ dir }}my.yaml"
    wait: yes
    wait_sleep: 10
    wait_timeout: 600
    wait_condition:
      type: Complete
      status: "True"
Wiper
  • 85
  • 7
4

You can use kubernetes.core.k8s_info from kubernetes.core collection

For example, wait for cert-manager to be up in the cert-manager namespace:

- name: Wait until cert-manager is up
  kubernetes.core.k8s_info:
    kubeconfig: "{{ kubeconfig }}"
    api_version: v1
    kind: Pod
    namespace: cert-manager
  register: pod_list
  until: pod_list|json_query('resources[*].status.phase')|unique == ["Running"]
Xavi Martínez
  • 2,125
  • 1
  • 16
  • 17
0

Kubernetes version v1.23.0 (changelog) added ability for kubectl wait to wait on arbitary JSON path.

So, it seems that kubectl wait can be used now to wait for status phases also.

Wait for the pod "busybox1" to contain the status phase to be "Running" :

kubectl wait --for=jsonpath='{.status.phase}'=Running pod/busybox1

You can use this command in ansible playbook task:

- name: Wait for the pods to come up with status 'Running'
  shell: "kubectl wait -n kube-system --for=jsonpath='{.status.phase}'=Running pods --selector tier=control-plane --timeout=120s"
  register: control_plane_pods_running

- debug: var=control_plane_pods_running.stdout_lines
AnjK
  • 2,887
  • 7
  • 37
  • 64
0

A one-liner shell command would do:

kubectl wait pod --all -n ${the_namespace} --timeout=3m \
--for=condition=ready --field-selector=status.phase!=Succeeded

It will wait up to 3 minutes for all pods to be running in the namespace. The field selector is made to ignore completed pods that Succeeded (Capital "S" is important).

Noam Manos
  • 15,216
  • 3
  • 86
  • 85
0

Finally, i took the answer from https://stackoverflow.com/users/2989261/rico as it seems the most reliable (kubectl wait --for=confition=ready was not reliable for my kubectl version 1.23.5) and just added a selector for all my pods (this is a good practice to filter Completed jobs) like this:

tasks:
- name: wait for pods from group app-xx to come up
  shell: kubectl get pods --selector=app-group=app-xx -o json
  register: kubectl_get_pods
  until: kubectl_get_pods.stdout|from_json|json_query('items[*].status.phase')|unique == ["Running"]
  retries: 12
  delay: 5
Community
  • 1
  • 1
ome13
  • 1
  • 1