How to reboot CentOS 7 with Ansible?

Question

I'm trying to reboot server running CentOS 7 on VirtualBox. I use this task:

- name: Restart server
  command: /sbin/reboot
  async: 0
  poll: 0
  ignore_errors: true

Server is rebooted, but I get this error:

TASK: [common | Restart server] ***********************************************
fatal: [rolcabox] => SSH Error: Shared connection to 127.0.0.1 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.

FATAL: all hosts have already failed -- aborting

What am I doing wrong? How can I fix this?

Based on the answer provided by Marcin Skarbek I prepared and published to Ansible Galaxy role what use that method. The role Reboot-And-Wait you can find [here](https://galaxy.ansible.com/it-praktyk/Reboot-And-Wait/). Thank you for using, feedbacks are welcomed. — Wojciech Sciesinski, Mar 17 '17 at 23:22
Due to Ansible's quick pace of development, the older answers are not working for me anymore. Please have a look at my answer. — Telegrapher, Nov 22 '17 at 00:50

score 43 · Accepted Answer · edited Dec 13 '17 at 19:03

43

You're likely not doing anything truly wrong, it's just that /sbin/reboot is shutting down the server so quickly that the server is tearing down the SSH connection used by Ansible before Ansible itself can close it. As a result Ansible is reporting an error because it sees the SSH connection failing for an unexpected reason.

What you might want to do to get around this is to switch from using /sbin/reboot to using /sbin/shutdown instead. The shutdown command lets you pass a time, and when combined with the -r switch it will perform a reboot rather than actually shutting down. So you might want to try a task like this:

- name: Restart server
  command: /sbin/shutdown -r +1
  async: 0
  poll: 0
  ignore_errors: true

This will delay the server reboot for 1 minute, but in doing so it should give Ansible enough time to to close the SSH connection itself, thereby avoiding the error that you're currently getting.

edited Dec 13 '17 at 19:03

approxiblue

6,982
16
51
59

answered Apr 29 '15 at 23:32

Bruce P

19,995
8
63
73

4

Thanks, this works great! There was just one small catch: I've got ```Failed to parse time specification: +1m``` error, so I had to replace ```+1m``` with ```+1```. – Domen Blenkuš Apr 30 '15 at 08:23
Note: using reboot is not a good idea unless you know what you are doing. It maybe ok in a lot of linux distros, but on other unixes it can bypass a lot of the system shutdown scripts, and is very hard. This can lead to inconsistent dbs etc so bad practice. Better to use shutdown or init. – krad Feb 06 '18 at 16:02
What is the purpose of async and poll here? – Basil Musa Jun 19 '18 at 16:32
1

@Basil, [see the docs here](https://docs.ansible.com/ansible/latest/user_guide/playbooks_async.html). ;) – Paul Hodges Jun 28 '18 at 14:57

score 12 · Answer 2 · answered Apr 30 '15 at 07:23

After the reboot task, you should have a local_action task that waits for the remote host to finish rebooting, otherwise, the ssh connection will be terminated and so is the playbook.


- name: Reboot server
  command: /sbin/reboot

- name: Wait for the server to finish rebooting
  sudo: no
  local_action: wait_for host="{{ inventory_hostname }}" search_regex=OpenSSH port=22 timeout=300

I also wrote a blog post about achieving a similar solution: https://oguya.github.io/linux/2015/02/22/ansible-reboot-servers/

score 10 · Answer 3 · answered Apr 03 '17 at 11:18

10

- name: restart server
  shell: sleep 2 && shutdown -r now "Ansible updates triggered"
  async: 1
  poll: 0
  become: true
  ignore_errors: true


- name: waiting for the server to come back
  local_action: wait_for host=testcentos state=started delay=30 timeout=300
  sudo: false

answered Apr 03 '17 at 11:18

Saad

916
1
15
28

1

This works great for me, especially the `async: 1`. Ansible 2.3 adds the helpful `wait_for_connection delay=20` – Victor Roetman Sep 21 '17 at 15:28
wait_for_connection is much better – FelikZ Mar 13 '18 at 11:18

score 7 · Answer 4 · answered May 04 '16 at 06:07

7

Another solution:

- name: reboot host
  command: /usr/bin/systemd-run --on-active=10 /usr/bin/systemctl reboot
  async: 0
  poll: 0

- name: wait for host sshd
  local_action: wait_for host="{{ inventory_hostname }}" search_regex=OpenSSH port=22 timeout=300 delay=30

systemd-run creates "on the fly" new service which will start systemctl reboot after 10 sec of delay (--on-active=10). delay=30 in wait_for to add extra 20 sec to be sure that host actually started rebooting.

answered May 04 '16 at 06:07

Marcin Skarbek

114
1
1

1

Really thanks, I think your solution is the best, with the wait_for. – Asier Gomez Oct 20 '16 at 08:24
Thanks, waiting a minute is too much! – geckos Jul 19 '18 at 17:30

score 6 · Answer 5 · answered Mar 31 '17 at 19:56

None of the above solutions worked reliably for me.

Issuing a /sbin/reboot crashes the play (the SSH connection is closed before ansible finished the task, it crashes even with ignore_errors: true) and /usr/bin/systemd-run --on-active=2 /usr/bin/systemctl reboot will not reboot after 2 seconds, but after a random amount of time between 20 seconds and one minute, so the delay is sometime not sufficient and this is not predictable.

Also I don't want to wait for minutes while a cloud server can reboot in few seconds.

So here is my solution:

- name: Reboot the server for kernel update
  shell: ( sleep 3 && /sbin/reboot & )
  async: 0
  poll: 0 

- name: Wait for the server to reboot
  local_action: wait_for host="{{ansible_host}}" delay=15 state=started port="{{ansible_port}}" connect_timeout=10 timeout=180

That's the shell: ( sleep 3 && /sbin/reboot & ) line that does the trick.

Using ( command & ) in shell script runs a program in the background and detaches it: the command succeed immediately but persists after the shell is destroyed.

Ansible get its response immediately and the server reboots 3 seconds later.

How does putting the /sbin/reboot command in the background help? — mbigras, Oct 05 '18 at 23:33
@mbigras Because it doesn't block Ansible's execution, so the second action is executed immediately. Without that, it would not work: Ansible would wait for the command to return, but will lost the connection during the system's shutdown. I believe that latest Ansible releases have better solution, but I didn't investigate further. — cronvel, Oct 07 '18 at 09:31
@mbigras Anyway, whatever the reason, other answers simply don't work for me reliably. — cronvel, Oct 07 '18 at 09:34

Telegrapher · Answer 6 · 2018-04-03T13:00:09.497

5

Ansible is developing quickly and the older answers were not working for me.

I found two issues:

The recommended way of rebooting may kill the SSH connection before Ansible finishes the task.

It is better to run: nohup bash -c "sleep 2s && shutdown -r now" &

This will launch a shell with the sleep && shutdown, but will not wait for the shell to end due to the last &. The sleep will give some time for the Ansible task to end before the reboot and the nohup will guarantee that bash doesn't get killed when the task ends.

The wait_for module is not reliably waiting for the SSH service.

It detects the port open, probably open by systemd, but when the next task is run, SSH is still not ready.

If you're using Ansible 2.3+, wait_for_connection works reliably.

The best 'reboot and wait' in my experience (I am using Ansible 2.4) is the following:

- name: Reboot the machine
  shell: nohup bash -c "sleep 2s && shutdown -r now" &

- name: Wait for machine to come back
  wait_for_connection:
    timeout: 240
    delay: 20

I've got the nohup command from: https://github.com/keithchambers/microservices-playground/blob/master/playbooks/upgrade-packages.yml

I edited this message to:

add krad's portability suggestion, using shutdown -r now instead of reboot
add a delay. It is needed to avoid Ansible to execute the next step if the reboot is slow
increase the timeout, 120s was too little for some slow BIOS.

edited Apr 03 '18 at 13:00

answered Nov 22 '17 at 00:45

Telegrapher

330
4
11

as per my previous comments its really bad to use reboot – krad Feb 06 '18 at 16:18
Sounds sensible in a non-Linux world. I've never found a linux where reboot wouldn't perform a proper organized reboot. – Telegrapher Feb 06 '18 at 16:39
Another advantage is that the shutdown command can include a timeout, without the need of using sleep, being cleaner. I like your suggestion of portability, but I'll test it before changing it here, just in case. The answer I posted is the simplest of all and works reliably in the meantime. – Telegrapher Feb 06 '18 at 16:45
I've had a look, and I don't like one thing from shutdown. The minimum granularity of the delay is 1 min, something that is wasteful, so we can't stop using the sleep. – Telegrapher Apr 03 '18 at 12:53
Just use 'shutdown now' with a sleep in front. The world is bigger than linux. – krad Apr 04 '18 at 12:06
I'm using the comments as thought process documentation. I would've expected that you would've had a look at the updated answer before your non-constructive comment. – Telegrapher Apr 05 '18 at 12:19

score 3 · Answer 7 · edited Jun 21 '17 at 09:42

Yet another (combined from other answers) version:

---
- name: restart server
  command: /usr/bin/systemd-run --on-active=5 --timer-property=AccuracySec=100ms /usr/bin/systemctl reboot
  async: 0
  poll: 0
  ignore_errors: true
  become: yes

- name: wait for server {{ ansible_ssh_host | default(inventory_hostname) }} to come back online
  wait_for:
    port: 22
    state: started
    host: '{{ ansible_ssh_host | default(inventory_hostname) }}'
    delay: 30
  delegate_to: localhost

score 3 · Answer 8 · answered Jun 20 '18 at 12:10

Following solution works for me perfect:

- name: Restart machine
  shell: "sleep 5 && sudo shutdown -r now"
  async: 1
  poll: 0

- name: wait for ssh again available.
  wait_for_connection:
    connect_timeout: 20
    sleep: 5
    delay: 5
    timeout: 300

Sleep is required because ansible requires few second's to wrap up connection. Excelent post about this problem was written here: https://www.jeffgeerling.com/blog/2018/reboot-and-wait-reboot-complete-ansible-playbook

kxu · Answer 9 · 2018-10-18T09:56:26.173

if you're using Ansible version >=2.7, you can use reboot module as described here

The synopsis of the reboot module itself:

Reboot a machine, wait for it to go down, come back up, and respond to commands.

In a simple way, you can define a simple task like this:

    - name: reboot server
      reboot:

But you can add some params like test_command to test if your server is ready to take further tasks

    - name: reboot server
      reboot:
        test_command: whoami

Hope this helps!

score 2 · Answer 10 · edited Jun 13 '18 at 04:02

2

I am using Ansible 2.5.3. Below code works with ease,

- name: Rebooting host
  shell: 'shutdown -r +1 "Reboot triggered by Ansible"'

- wait_for_connection:
    delay: 90
    timeout: 300

You can reboot immediately, then insert a delay if your machine takes a while to go down:

    - name: Rebooting host
      shell: 'shutdown -r now "Reboot triggered by Ansible"'
      async: 1
      poll: 1
      ignore_errors: true

# Wait 120 seconds to make sure the machine won't connect immediately in the next section.
    - name: Delay for the host to go down
      local_action: shell /bin/sleep 120

Then poll to make the playbook return as soon as possible:

    - name: Wait for the server to finish rebooting
      wait_for_connection:
        delay: 15
        sleep: 15
        timeout: 300

This will make the playbook return as soon as possible after the reboot.

edited Jun 13 '18 at 04:02

Mike S

1,235
12
19

answered May 31 '18 at 10:09

Ashwin

993
1
16
41

Hello, I find that this solution may work, but it is sub-optimal. I specified the reasons in my answer's comment, those are: - shutdown -r +1 is safe and will always work, but +1 adds one minute of delay to the reboot, which may be undesirable. - shutdown -r now is not safe, since ssh may be killed before Ansible gets its answer back, returning a failure, hence the +1 or the nohup + sleep combination are needed. I would like to find a simpler solution like shutdown -r +1, but I don't want the 1 min delay. – Telegrapher Jul 09 '18 at 14:39
Seems like your nohup answer is just another way to skin the cat. You've given the shell the job of exiting cleanly, so you don't have to tell Ansible to ignore the error. You say potaytoh, I say potahtoe. – Mike S Jul 11 '18 at 22:50
Well, it is usually better to avoid triggering an error than ignoring errors in general. By ignoring all possible shutdown errors, you may be ignoring another error that should be noticed. – Telegrapher Sep 13 '18 at 18:12

score 1 · Answer 11 · answered Apr 30 '15 at 08:24

At reboot time all ssh connections are closed. That's why the Ansible task fails. The ignore_errors: true or failed_when: false additions are no longer working as of Ansible 1.9.x because handling of ssh connections has changed and a closed connection now is a fatal error which can not be caught during play.

The only way I figured out how to do it is to run a local shell task which then starts a separate ssh connection, which then may fail.

- name: Rebooting
  delegate_to: localhost
  shell: ssh -S "none" {{ inventory_hostname }} sudo /usr/sbin/reboot"
  failed_when: false
  changed_when: true

Thanks for explanation, but your approach probably requires passwordless sudo (or did I miss something?), so I can't use it in production. — Domen Blenkuš, Apr 30 '15 at 10:19

How to reboot CentOS 7 with Ansible?

11 Answers11

Linked