How repeat ansible task until the result is failed + show timestamps of every retry?

Question

I am trying to solve a network automation issue. The issue is we have a strange behaviour of network devices (SNOM Phones) connected in a chain to the certain Cisco switch port. The thing is the one of such phones (every time the different one) is disappearing randomly, and after that such device can't get a IP address via DHCP. We still did not found the way to reproduce the issue, so I've enabled debug logs at the DHCP server and now awaiting then one of mac addresses will disappear from the switch interface mac address table.

And as cisco do not support linux 'watch' command, I've wrote a simple ansible playbook for such purpose:

---
- name: show mac address-table 
  hosts: ios
  gather_facts: no


  tasks:

  - name: show mac address-table interface Fa0/31
    ios_command:
      commands: show mac address-table interface Fa0/31
      wait_for:
        - result[0] contains 0004.1341.799e
        - result[0] contains 0004.134a.f67d
        - result[0] contains 0004.138e.1a53
    register: result
    until: result is failed
    retries: 1000
  - debug: var=result

But in that configuration i see the only

FAILED - RETRYING: show mac address-table interface Fa0/31 (660 retries left).
FAILED - RETRYING: show mac address-table interface Fa0/31 (659 retries left).
FAILED - RETRYING: show mac address-table interface Fa0/31 (658 retries left).
FAILED - RETRYING: show mac address-table interface Fa0/31 (657 retries left).

at the output. I've tried to use anstomlog callback plugin, but it show the timestamps only for the succeded conditions (i.e. in my case - then result is failed)

So, I am looking for an advice, how to achieve both goals:

run task forever until status get failed
write timestams of every single retry

Thanks in advance!

score 0 · Answer 1 · answered Mar 01 '19 at 12:29

It's better to rewrite it as a normal loop (with include_tasks) and report all information you need in that task.

Relying on 'retry' as a watchdog is not a great idea.

Moreover, I think it's better to rewrite it as a independent program. If you are worrying about ssh to switch, netmiko is a great collection of ready-to-use quirks for all network devices. It has '.command' method to execute on switches.

score 0 · Answer 2 · answered Mar 06 '19 at 13:33

well, as the initial question was about Ansible I solved the issue just by saving the timestamp & getting dhcp log from router & filtering log by timestamp and mac addresses:

---
- name: Find switch port by host ip address
  hosts: all
  gather_facts: no
  connection: local
  roles:
    - Juniper.junos  
  vars:
    systime: "{{ ansible_date_time.time }}"
    timestamp: "{{ ansible_date_time.date }}_{{ systime }}"
    connection_settings:
      host: "{{ ansible_host }}"
      timeout: 120
    snom_mac_addresses:
      - '00_04:13_41:79_9e'
      - '00_04:13_4a:f6_7d'
      - '00_04:13_8e:1a_53'

  tasks:

  - name: show mac address-table interface Fa0/31
    ios_command:
      commands: show mac address-table interface Fa0/31
      wait_for:
        - result[0] contains {{ snom_mac_addresses[0] | replace(':', '.')| replace('_', '') }}
        - result[0] contains {{ snom_mac_addresses[1] | replace(':', '.')| replace('_', '') }}
        - result[0] contains {{ snom_mac_addresses[2] | replace(':', '.')| replace('_', '') }}
        - result[0] contains {{ snom_mac_addresses[3] | replace(':', '.')| replace('_', '') }}
    register: result
    until: result is failed
    retries: 1000
    ignore_errors: True
    when: inventory_hostname == 'access-switch'


  - name: save timestamp in Junos format
    set_fact: 
      junos_timestamp: "{{ lookup('pipe','date +%b_%_d_%H:%M') | replace('_', ' ') }}"
    run_once: yes
    delegate_to: localhost

  - debug: 
      var: junos_timestamp
    run_once: yes
    delegate_to: localhost

  - name: get dhcp log from router
    junos_scp:
      provider: "{{ connection_settings }}"
      src: /var/log/dhcp-service.log
      remote_src: true
    when: inventory_hostname == 'router'

  - name: filter log for time
    run_once: yes
    shell: "egrep -i '{{ junos_timestamp }}' dhcp-service.log"
    register: grep_time_output
    delegate_to: localhost

  - debug: var=grep_time_output.stdout_lines    

  - name: filter log for time and mac
    run_once: yes
    shell: "egrep -i '{{ snom_mac_addresses | join('|') | replace(':', ' ')| replace('_', ' ') }}' dhcp-service.log"
    register: grep_mac_output
    delegate_to: localhost

  - debug: var=grep_mac_output.stdout_lines

Pretty sure it's not looks like an elegant solution, but at least I did all work within a single Ansible environment and anyone could re-use part of my code without significant refactoring.

just one doubt - I've to use my own format for mac addresses, because Cisco and Juniper debug log are printing them in a different manner:

Juniper debug log:

Mar  6 13:14:19.582886 [MSTR][DEBUG] client_key_compose: Composing key (0x1c6aa00) for cid_l 7, cid d4 a3 3d a1 e2 38, mac d4 a3 3d a1 e2 38, htype 1, subnet 10.111.111.1, ifindx 0, opt82_l 0, opt82 NULL

Cisco:

 30    0004.133d.39fb    DYNAMIC     Po1

But maybe there is a clever way to handle all different formats for mac addresses in Ansible.

How repeat ansible task until the result is failed + show timestamps of every retry?

2 Answers2