1

I want to maintain a state file to keep track of whether a role ran or not. Ansible's retry file does not make sense as multiple roles that I have in the playbook are calling bunch of different APIs.

In a multi DC setup, a given playbook is iterated. If something fails and playbook exits out, using Ansible's default retry file didnt resumed where it should have resumed from.

I wanted to know if the meta/main.yml can somehow read a dynamic state file that keeps track of DC and role...maybe we can read and determine if the given role can execute or not. We can definetely put a bunch of when conditionals for every tasks in the main.yml of the role. Is there a better way?

1 Answers1

0

... Ansible's default retry file didn't resumed where it should have resumed from.

The retry files indicate the hosts that failed an Ansible execution, those would have a similar effect as the state files you mentioned. As each execution is treated independent and in atomic way, you would need to execute the playbook (or role) to ensure that variables and task state are in memory to ensure a consistent result.

Ansible playbooks are expected to be idempotent, you should be able to execute them several times, and the resulting state should be the same.

Have you identified why the execution failed in those hosts?

Carlos Monroy Nieblas
  • 2,225
  • 2
  • 16
  • 27
  • if you have a task lets say using a shell module that takes {{ ansible_host }} as an argument but is delegated to localhost. If that task fails for say 5th out of 10 hosts, retry file didnt resume from the failed host.. – starry_cloud Oct 30 '22 at 05:10
  • 1
    With the limited data provided, the response will be vague and too general. In most of the cases, operations using the shell modules can be replaced with Ansible modules, modules are usually more stable and error handling is easier. Have you re-executed the playbook with `--limit @` ? Have you determined the cause of the failed execution? – Carlos Monroy Nieblas Oct 30 '22 at 05:24
  • This is a topic that can be challenging, as there is no solution that fits all the use cases; this is a recurrent debate topic, for example this thread (https://stackoverflow.com/questions/29900096/how-to-continue-execution-on-failed-task-after-fixing-error-in-playbook) from 7 years ago – Carlos Monroy Nieblas Oct 30 '22 at 05:29
  • 1
    @starry_cloud Ansible never resumes where it left off as it is stateless. You retry all the tasks on the failed host since, as already mentioned, those tasks are expected to be idempotent (i.e. produce the same expected state on the target system and change things only if needed). What you are trying to implement in your meta file should in fact be implemented directly in each role: they should be idempotent. – Zeitounator Oct 30 '22 at 05:37