13

Thanks in advance for your time that you spent reading this.

I'm playing with Kubernetes and use Ansible for any interactions with my cluster. Have some playbooks that successfully deploy applications.

My main ansible component I use for deployment is k8s that allow me to apply my yaml configs.

I can successfully wait until deployment completes using

k8s:
    state: present
    definition: config.yaml
    wait: yes
    wait_timeout: 10

But, unfortunately, the same trick doesn't work by default with Kubernetes Jobs. The module simply exits immediately that is clearly described in ansible module, that's true:

For resource kinds without an implementation, wait returns immediately unless wait_condition is set.

To cover such a case, module spec suggests to specify

wait_condition:
  reason: REASON
  type: TYPE
  status: STATUS

The doc also says:

The possible types for a condition are specific to each resource type in Kubernetes. See the API documentation of the status field for a given resource to see possible choices.

I checked API specification and found the same as stated in the following answer:

the only type values are “Complete” and “Failed”, and that they may have a ”True” or ”False” status

So, my QUESTION is simple: is there anyone who know how to use this wait_condition properly? Did you try it already (as for now, it's relatively new feature)?

Any ideas where to look are also appreciated.

UPDATE:

That's a kind of workaround I use now:

- name: Run Job
  k8s:
   state: present
   definition: job_definition.yml

- name: Wait Until Job Is Done
  k8s_facts:
    name: job_name
    kind: Job
  register: job_status
  until: job_status.resources[0].status.active != 1
  retries: 10
  delay: 10
  ignore_errors: yes

- name: Get Final Job Status
  k8s_facts:
    name: job_name
    kind: Job
  register: job_status

- fail:
    msg: "Job Has Been Failed!"
  when: job_status.resources[0].status.failed == 1

But would be better to use the proper module feature directly.

4 Answers4

10

(The other answers were so close that I'd edit them, but it says the edit queues are full.) The status in Job Condition is a string. In YAML a True tag is resolved to boolean type and you need to quote it to get the string. Like in the YAML output of the Job:

$ kubectl -n demo get job jobname -o yaml
apiVersion: batch/v1
kind: Job
metadata: ...
spec: ...
status:
  completionTime: "2021-01-19T16:24:47Z"
  conditions:
  - lastProbeTime: "2021-01-19T16:24:47Z"
    lastTransitionTime: "2021-01-19T16:24:47Z"
    status: "True"
    type: Complete
  startTime: "2021-01-19T16:24:46Z"
  succeeded: 1

Therefore to get completion you need to quote the status in wait_condition.

  k8s:
    wait: yes
    wait_condition:
      type: Complete
      status: "True"

(The wait parameter expects boolean and in YAML yes is a string, but Ansible accepts more values to boolean parameters.)

Marko Kohtala
  • 744
  • 6
  • 16
2

wait_condition works for me with jobs, as long as timeout/type/status are set appropriately, based on your job average time process:

        wait: yes
        wait_timeout: 300
        wait_condition:
          type: Complete
          status: True
flabatut
  • 21
  • 2
  • Tried with latest ansible version. Result is the same, module simply hangs. Will try once again after the next release, but intul now it doesn't solve my problem for some reason. Anyway, thanks for your answer. – Konstantin Dobroliubov Sep 20 '19 at 09:50
  • Can't edit, "suggested edit queue full", but I think the problem could be the status is not boolean, but string. The wait_condition should read status: "True", not status: True. – Marko Kohtala Jan 18 '21 at 13:19
1

Kubernetes documentation specifies that:

As pods successfully complete, the Job tracks the successful completions. When a specified number of successful completions is reached, the task (ie, Job) is complete.

Based on this and the API specification you already linked - we can assume that Job will have condition type Complete set as True when it was successfully executed that many times as you requested.

Hence:

wait_condition:
  type: Complete
  status: True

Should do the "job".

As it is stated in k8s plugin code, reason is ignored when it is not specified.

I didn't test it. Just based on code and documentation so it would be nice if you could confirm that it works or not.

Daniel Szot
  • 93
  • 1
  • 7
  • Daniel, thanks a lot for your time. Tried with that as well, but, unfortunately, result remains the same: no error messages, k8s module simply hangs. So, it doesn't do the trick. – Konstantin Dobroliubov Aug 19 '19 at 14:59
0

I believe that the wait_condition type must be related to the resource type you are using type: Complete for example applies to deployments but hangs when a serviceaccount or statefulset is in the yaml.

Wiper
  • 85
  • 7