... the time it takes for 1 VM can be 2-5 sec, which makes its very inefficient when I want to start 50 VMs ...
Right, this is the usual behavior.
Is there any way to make it in parallel?
As already mentioned within the comments by Vladimir Botka, asynchronous actions and polling is worth a try since
By default Ansible runs tasks synchronously, holding the connection to the remote node open until the action is completed. This means within a playbook, each task blocks the next task by default, meaning subsequent tasks will not run until the current task completes. This behavior can create challenges.
You see it in your case in the task and in a loop.
Probably the Best Practice to address the use case and to eliminate the cause is to enhance the module code.
According the documentation vmware_guest_powerstate
module – Manages power states of virtual machines in vCenter and source ansible-collections/community.vmware/blob/main/plugins/modules/vmware_guest_powerstate.py, the parameter name:
takes one name for one VM only. If it would be possible to provide a list of VM names "{{ hostlist }}"
to the module directly, there would be one connection attempt only and the loop happening one the Remote Node instead of the Controller Node (... even if this is running localhost
for both cases).
To do so one would need to start with name=dict(type='list')
instead of str
and implement all other logic, error handling and responses.
Further Documentation
Since the community vmware_guest_powerstate
module is importing and utilizing additional libraries
Meanwhile and based on
Further Q&A and Tests
I've setup another short performance test to simulate the behavior you are observing
---
- hosts: localhost
become: false
gather_facts: false
tasks:
- name: Gather subdirectories
shell:
cmd: "ls -d /home/{{ ansible_user }}/*/"
warn: false
register: subdirs
- name: Gather stats (loop) async
shell: "stat {{ item }}"
loop: "{{ subdirs.stdout_lines }}"
loop_control:
label: "{{ item }}"
async: 5
poll: 0
- name: Gather stats (loop) serial
shell: "stat {{ item }}"
loop: "{{ subdirs.stdout_lines }}"
loop_control:
label: "{{ item }}"
- name: Gather stats (list)
shell: "stat {% raw %}{{% endraw %}{{ subdirs.stdout_lines | join(',') }}{% raw %}}{% endraw %}"
register: result
- name: Show result
debug:
var: result.stdout
and found that adding async
will add some additional overhead resulting into even longer execution time.
Gather subdirectories ------------------------ 0.57s
Gather stats (loop) async -------------------- 3.99s
Gather stats (loop) serial ------------------- 3.79s
Gather stats (list) -------------------------- 0.45s
Show result ---------------------------------- 0.07s
This is because of the "short" runtime of the executed task in comparison to "long" time establishing a connection. As the documentation pointed out
For example, a task may take longer to complete than the SSH session allows for, causing a timeout. Or you may want a long-running process to execute in the background while you perform other tasks concurrently. Asynchronous mode lets you control how long-running tasks execute.
one may take advantage from async
in case of long running processes and tasks.
In respect the given answer from @Sonclay I've performed another test with
---
- hosts: all
become: false
gather_facts: false
tasks:
- name: Gather subdirectories
shell:
cmd: "ls -d /home/{{ ansible_user }}/*/"
warn: false
register: subdirs
delegate_to: localhost
- name: Gather stats (loop) serial
shell: "stat {{ item }}"
loop: "{{ subdirs.stdout_lines }}"
loop_control:
label: "{{ item }}"
delegate_to: localhost
whereby a call with
ansible-playbook -i "test1.example.com,test2.example.com,test3.example.com" --forks 3 test.yml
will result into an execution time of
Gather subdirectories ------------------------ 0.72s
Gather stats (loop) -------------------------- 0.39s
so it seems to be worth a try.