2

I have the below playbook test1.yml that gets istat data for 26 subfolders under this directory /var/myfile/pdf.

  tasks:

    - name: List directories
      raw: "ls -d "/var/myfile/pdf/*/"
      register: subdir

    - name: List pid files
      raw: "istat {{ item }}"
      with_items: "{{ subdir.stdout_lines }}"

I run the playbook and it takes 29 seconds to complete.

time ANSIBLE_SSH_PIPELINING=True ansible-playbook -i=10.9.9.12, -f 30 test1.yml -vvv

After the playbook completes below is the time taken details output:

Output:

    real    0m29.144s
    user    0m6.206s
    sys     0m5.618s

I now put the same code with istat task in include_tasks file like below.

Playbook test2.yml

  tasks:

    - name: List directories
      raw: "ls -d "/var/myfile/pdf/*/"
      register: subdir

    - name: List pid files
      include_tasks: "innertest.yml"
      with_items: "{{ subdir.stdout_lines }}"
cat innertest.yml
      - raw: "istat {{ item }}"
time ANSIBLE_SSH_PIPELINING=True ansible-playbook -i=10.9.9.12, -f 30 test2.yml -vvv

Output:

    real    0m59.044s
    user    0m18.203s
    sys     0m10.118s

As you can see the time with the same amount of task has more than doubled due to include_tasks

In the debug, I also see there are 26 SHH connections triggered for the 26 sub-directories with_items for the same target host 10.9.9.12.

I'm not sure of how this works internally but it would have been nice to have a single SSH connection for istat for 26 sub-directories on the same host for performance reasons.

Is there a way to increase the performance for include_tasks and bringing down the number of ssh connections to the same host ?

U880D
  • 8,601
  • 6
  • 24
  • 40
Ashar
  • 2,942
  • 10
  • 58
  • 122

1 Answers1

0

Is there a way to increase the performance ... and bringing down the number of SSH connections to the same host?

I understand that you like to know "How to increase the performance and decrease execution time of specific tasks?".

To achieve your goal you may have a look into the following example, documentation and further links.

---
- hosts: test
  become: false
  gather_facts: false

  tasks:

  - name: Gather subdirectories
    shell:
      cmd: "ls -d /home/{{ ansible_user }}/*/"
      warn: false
    register: subdirs

  - name: Gather stats (loop)
    shell:
      cmd: "stat {{ item }}"
      warn: false
    loop: "{{ subdirs.stdout_lines }}"
    loop_control:
      label: "{{ item }}"

  - name: Gather stats (list)
    shell:
      cmd: "stat {% raw %}{{% endraw %}{{ subdirs.stdout_lines | join(',') }}{% raw %}}{% endraw %}"
      warn: false
    register: result

  - name: Show result
    debug:
      var: result.stdout

resulting into execution and runtimes of

TASK [Gather subdirectories] ***********************
changed: [test.example.com]
Sunday 31 July 2022 (0:00:01.412) 0:00:01.448 ******

TASK [Gather stats (loop)] *************************
changed: [test.example.com] => (item=/home/user/01/)
changed: [test.example.com] => (item=/home/user/02/)
changed: [test.example.com] => (item=/home/user/03/)
...
changed: [test.example.com] => (item=/home/user/24/)
changed: [test.example.com] => (item=/home/user/25/)
changed: [test.example.com] => (item=/home/user/26/)
Sunday 31 July 2022 (0:00:31.715) 0:00:33.164 ******

TASK [Gather stats (list)] *************************
changed: [test.example.com]
Sunday 31 July 2022 (0:00:01.361) 0:00:34.525 ******

Gather subdirectories ------------------------ 1.41s
Gather stats (loop) ------------------------- 31.72s
Gather stats (list) -------------------------- 1.36s
Show result ---------------------------------- 0.08s

I've also tested with the raw module instead of using the shell module

Gather stats (loop) via 'raw' ---------------- 4.96s
Gather stats (list) via 'raw' ---------------- 0.27s

Whereby looping over commands and providing one parameter for the command per run results into a lot of overhead and multiple SSH connections as well, providing the list directly to the command might be possible and increase performance and decrease runtime and resource consumption.

it would have been nice to have a single SSH connection for istat for 26 sub-directories on the same host for performance reasons.

For this you would simply need to execute your task just once, resulting into one SSH connection only. To do so, you might be able provide a list of directories to the command directly via Ansible, like in example

istat {01,02,03,...,24,25,26}

As you can see, for this the command needs to include curly braces which needs to be escaped in Ansible. The directory list would need to be a string, whereby the directories are comma separated. For this you can use the join() filter.

Finally you would end up with

istat {% raw %}{{% endraw %}{{ subdir.stdout_lines | join(',') }}{% raw %}}{% endraw %}"

Similar Question with Answer


Further Documentation

Further Readings

U880D
  • 8,601
  • 6
  • 24
  • 40