79

When writing and debugging Ansible playbooks, typical workflow is as follows:

  1. ansible-playbook ./main.yaml
  2. Playbook fails on some task
  3. Fix this task and repeat line 1, waiting for all previous tasks to execute again. Which takes a lot of time

Ideally, i'd like to resume execution on failed task, having inventory and all facts collected by previous tasks. Is it even possible? How to make playbook writing/debugging faster?

udondan
  • 57,263
  • 20
  • 190
  • 175
Sergey Alaev
  • 1,003
  • 2
  • 8
  • 7

3 Answers3

67

Take a look at Executing playbooks for troubleshooting. If you want to start executing your playbook at a particular task, you can do so with the --start-at-task option:

ansible-playbook playbook.yml --start-at-task="install packages"

The above will start executing your playbook at a task named “install packages”.

Alternatively, take a look at this previous answer How to run only one task in ansible playbook?

Finally, when a play fails, it usually gives you something along the lines of:

PLAY RECAP ******************************************************************** 
           to retry, use: --limit @/home/user/site.retry

Use that --limit command and it should retry from the failed task.

Community
  • 1
  • 1
Mxx
  • 8,979
  • 4
  • 27
  • 37
  • 4
    Thanks for list of options. but AFAIK --limit drops registered variables and custom facts so it is well, of limited use – Sergey Alaev Apr 28 '15 at 11:53
  • 31
    The `.retry` file only contains the failed hosts, it doesn't store where exactly each host failed. – Florian Brucker May 31 '16 at 06:07
  • 1
    @FlorianBrucker what a shame – igor Sep 09 '16 at 11:28
  • 2
    The option `--start-at-task` is broken, because if will work only, if you never use the `when` clause in any playbook. If the first tasks in the play registers its result for the second task, which uses it by a `when: first.changed` the second task will never be executed, if you start with the second task. The condition normally set by the first task, will never be set when starting with the second task. – ceving Nov 08 '16 at 13:11
  • 2
    As of ansible 2.2.1.0, `--start-at-task` does not work for tasks defined within roles. https://github.com/ansible/ansible/issues/15735 – Simon Woodside Feb 11 '17 at 22:20
  • 1
    To use this within roles, use static includes in your Ansible code. See [comment on that issue](https://github.com/ansible/ansible/issues/15735#issuecomment-282205336) from Ansible team. – RichVel Aug 01 '17 at 13:00
  • what's the full command to retry? Do we use the exact same command to start and just append the '--limit ...' line? – Leo Ufimtsev Aug 08 '18 at 01:01
28

Future readers:

The --limit @/home/user/site.retry would not help in such a scenario, the .retry only stores the failed host and nothing more, so will just execute all tasks against failed hosts.

If you are using the latest version (Ansible 2.x) the --start-at-task does not work for tasks defined inside roles.

You can achieve similar effect by just using the --step flag e.g: ansible-playbook playbook.yml --step. The step asks you on before executing each task and you could choose (N)o/(y)es/(c)ontinue.

With this approach you selectively execute tasks when needed and also continue from point where it failed, after fixes.

Segmented
  • 2,024
  • 2
  • 23
  • 44
  • This approach to use --step – Jeremy Whiting Aug 09 '18 at 09:28
  • Re-commented due to an edit timeout. This approach to use --step--step. To be repeated until all the failures are solved. \n What I wish for is the state of the ansible playbook(s) to be persisted to disk. When the --start-at-task is used the state is loaded from disk. This is a needed feature imo. – Jeremy Whiting Aug 09 '18 at 09:44
  • @JeremyWhiting I would imagine this would be case to use: `--start-at-task` directive. Here is a [quickstart link](https://docs.ansible.com/ansible/2.5/user_guide/playbooks_startnstep.html) – Segmented Aug 09 '18 at 14:44
  • thanks for suggestion but that's not suitable either. I mentioned there are many fact gathering steps the playbooks depend on to work. Ignoring all those prior steps is not an option. I need a "resume" feature. – Jeremy Whiting Aug 13 '18 at 08:00
  • @JeremyWhiting Ah true! Sorry I overlooked that. – Segmented Aug 13 '18 at 08:03
  • `--start-at-task` does work with roles. The required format: `--start-at-task=' : '`. – Flux Jan 30 '19 at 10:15
4

Future Future readers:

As of Ansible 2.4.2.0 --start-at-task works for tasks defined in roles I created.

The ansible team is not willing to address this issue they suggest you keep your roles idempotent and replay the entire play, I don't have time for this. In my roles I am not using a massive amount of facts like @JeremyWhiting, so for me I can use this --start-at-task feature.

Still however, this is a manual task so instead I wrote some ansible rpm and added a "Resume" feature that follows these basic steps:

  • Enable the ansible log via /etc/ansible/ansible.cfg (uncomment log_path)
  • Clear the log before each run
  • After a failure, the "Resume" feature greps this log for the last "TASK" line, and uses sed to get what is inside the "[]"
  • Then it simply calls the last run play, with --start-at-task="$start_at_task"
  • Ensure that you have "any_errors_fatal: true" in your roles to stop the play at the failing task you wish to resume from

The ansible team is unwilling to create this basic (and very useful) feature so the only choice is to hack it together via some bash scripts.

Trent
  • 51
  • 2
  • 1
    Maybe the ansible team is kept busy by developing all those half redundant and yet error prone concepts like *roles*, *group_vars*/*host_group*, *tags*, *when*/conditionals, *handlers*. The ansible team seems might reckon this feature bloat development strategy seemingly more important than such a feature you imho rightly desire for. A good test for those that dwell upon adopting ansible, should merely try check out how failures in the core parts ( ssh connection ) are not at all clearly reported. Ansible remains a challenge – fraleone Mar 27 '20 at 04:42