2

It's less a technical question, but maybe Ansible has features that would help, here, that I don't know yet. I'm able to automate patching with Ansible, but choosing the right hosts/groups in the right order is complicated, I'll try to explain it.

Lets take this inventory for example:

---
all:
  dcs:
    hosts:
      domaincontroller1
      domaincontroller2
  dbs:
    hosts:
      sql1
      sql2
  webservers:
    hosts:
      websrv1 #has a mysql connection and services vars
      websrv2
      websrv3 #has a mysql connection and services vars
      websrv4

So what you do on a patch day? You want that at least one domain controller is running every time. You want that all webservers that connect to sql are down, or their services are stopped, after that you first patch the sql servers, wait until they are running again, patch the web servers and wait until they connect to sql.

At the moment, I split the host file into two groups. First group is one DC and all servers that don't connect to sql. The second group contains sql1, sql2, webserver1, ... and there is a different playbook that patches the first 2 in the row first, and all other after that. But when doing this, I have an ugly/unsorted hosts file and I'm unable to apply changes to all web servers for example.

---
all:
  patch1:
    hosts:
      domaincontroller1
      websrv2
      websrv4
  patch2:
    hosts:
      sql1
      sql2
      domaincontroller2
      websrv1 #has a mysql connection and services vars
      websrv3 #has a mysql connection and services vars

How others do that? Is there a way to split groups in half, so there is a patch1 group that contains 50% of the DCs and all web servers where no services are defined (probably with dynamic groups?). Otherwise I would need to create the perfect grouped inventory and add groups for patch day 1 and 2 underneath it, which results in having one server multiple times in the same inventory what makes changes more complicated.

Another idea would be the use of tags, like patchfirst, patchsecond, and create for any server a host_vars file which is again pretty much work for about 100 hosts. Anyone ideas or examples how to get the best looking, best working result without making more work as manual patching would need?

β.εηοιτ.βε
  • 33,893
  • 13
  • 69
  • 83
EdFred
  • 77
  • 8
  • There was a somehow similar question recently [How to conditionally define hosts for a play in Ansible?](https://stackoverflow.com/questions/69554894/). – U880D Oct 29 '21 at 07:04
  • Right. The difference is that here ``" having one server multiple times in the same inventory"``, according to the questioner, ``"makes changes more complicated"``. – Vladimir Botka Oct 29 '21 at 07:18
  • "Nice question. But, it's offtopic here. Move it to serverfault" I think that depends a bit on the solution and i thought this is the best place for Ansible things. "There was a somehow similar question recently How to conditionally define hosts for a play in Ansible?" This is defenitly a solution, but making the update logic within the inventory itself would blow it up from 100 lines for 100 servers to 200+. And if i change one server it needs to be adjusted multiple times in the inventory. I would prefer new sql --> add to sql group --> automatically update in first place on next patchday – EdFred Oct 29 '21 at 08:24
  • You sure can create [dynamic host groups](https://docs.ansible.com/ansible/latest/collections/ansible/builtin/group_module.html) and [assign hosts in it](https://docs.ansible.com/ansible/latest/collections/ansible/builtin/add_host_module.html). You could also select a [`random`](https://jinja.palletsprojects.com/en/3.0.x/templates/#jinja-filters.random) controller that should stay up to put in those randomly created groups. – β.εηοιτ.βε Oct 29 '21 at 14:52
  • I think dynamic groups would be an good idea, even though i'm afraid to missing hosts or use them twice (for example getting 50% of the dcs if there are 3), but maybe i'll try that. To choose something random is nothing for me, i wan't to keep the patching is as constant as i can, but thanks for the tip :) – EdFred Oct 29 '21 at 17:02

1 Answers1

1

Here is a, probably naive, approach to achieve this.

To start with, you can create the groups in your inventory, but without hosts on them.

Notes:

  • I fixed some issues in your inventory, that was missing the children keyword, and where the hosts should have been YAML keys, so, ending with colon (:).
  • I added a sql_connection variable on some web servers, to craft the logic that elects web servers in the same patch day as the databases or not.
    Adapt it to the existing variables definied in your hosts.
  • I also added some extra hosts, just for demonstration purpose.

So, we have an inventory that looks like:

all:
  children:
    patch_day_1:
    patch_day_2:
    dcs:
      hosts:
        domaincontroller1:
        domaincontroller2:
    dbs:
      hosts:
        sql1:
        sql2:
        sql3:
        sql4:
        sql5:
    webservers:
      hosts:
        websrv1:
          sql_connection: mysql://some_connection_string
        websrv2:
        websrv3:
          sql_connection: mysql://some_connection_string
        websrv4:
        websrv5:

Then:

  • We elect the domain controllers, with the help of the array slicing notation.
  • We elect the web servers based on the fact that sql_connection is defined or not in the hostvars.
  • We elect all databases in the group containing the web servers defining a sql_connection.
  • Last, but not least, we run the playbook with the help of an extra-vars.

Here would be the playbook doing this:

- name: Logically split hosts
  hosts: localhost
  gather_facts: no

  tasks:
    - name: Elect half of the controllers in the first group
      add_host:
        name: "{{ item }}"
        groups: patch_day_1
      loop: "{{ groups.dcs[0:(groups.dcs | length / 2) | int] }}"

    - name: Elect all the web servers that do not have a `sql_connection` in the first group
      add_host:
        name: "{{ item }}"
        groups: patch_day_1
      loop: "{{ groups.webservers }}"
      when: hostvars[item].sql_connection is not defined

    - name: Elect the other half of the controllers in the second group
      add_host:
        name: "{{ item }}"
        groups: patch_day_2
      loop: "{{ groups.dcs[(groups.dcs | length / 2) | int:groups.dcs | length] }}"

    - name: Elect all the databases in the second group
      add_host:
        name: "{{ item }}"
        groups: patch_day_2
      loop: "{{ groups.dbs }}"

    - name: Elect all the web servers that do have a `sql_connection` in the second group
      add_host:
        name: "{{ item }}"
        groups: patch_day_2
      loop: "{{ groups.webservers }}"
      when: hostvars[item].sql_connection is defined


- name: Patch execution
  hosts: "patch_day_{{ patch_day | default(1) }}"
  gather_facts: no

  tasks:
    - name: Display hosts targeted in the play
      debug:
        var: ansible_play_hosts_all
      run_once: true

This, run with no extra-vars would default to the patch_day_1 group, yielding the result:

PLAY [Logically split hosts] *************************************************************************************

TASK [Elect half of the controllers in the first group] **********************************************************
ok: [localhost] => (item=domaincontroller1)
ok: [localhost] => (item=domaincontroller2)

TASK [Elect all the web servers that do not have a `sql_connection` in the first group] **************************
skipping: [localhost] => (item=websrv1) 
ok: [localhost] => (item=websrv2)
skipping: [localhost] => (item=websrv3) 
ok: [localhost] => (item=websrv4)
ok: [localhost] => (item=websrv5)

TASK [Elect the other half of the controllers in the second group] ***********************************************
ok: [localhost] => (item=domaincontroller3)
ok: [localhost] => (item=domaincontroller4)
ok: [localhost] => (item=domaincontroller5)

TASK [Elect all the databases in the second group] ***************************************************************
ok: [localhost] => (item=sql1)
ok: [localhost] => (item=sql2)
ok: [localhost] => (item=sql3)
ok: [localhost] => (item=sql4)
ok: [localhost] => (item=sql5)

TASK [Elect all the web servers that do have a `sql_connection` in the second group] *****************************
ok: [localhost] => (item=websrv1)
skipping: [localhost] => (item=websrv2) 
ok: [localhost] => (item=websrv3)
skipping: [localhost] => (item=websrv4) 
skipping: [localhost] => (item=websrv5) 

PLAY [Patch execution] *******************************************************************************************

TASK [Display hosts targeted in the play] ************************************************************************
ok: [domaincontroller1] => 
  ansible_play_hosts_all:
  - domaincontroller1
  - domaincontroller2
  - websrv2
  - websrv4
  - websrv5

PLAY RECAP *******************************************************************************************************
domaincontroller1          : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
localhost                  : ok=6    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

But, run with an extra-vars, then we can target the patch_day_2.
So, running it with the command

ansible-playbook play.yml --extra-vars="patch_day=2"

This yields the recap:

PLAY [Logically split hosts] *************************************************************************************

TASK [Elect half of the controllers in the first group] **********************************************************
ok: [localhost] => (item=domaincontroller1)
ok: [localhost] => (item=domaincontroller2)

TASK [Elect all the web servers that do not have a `sql_connection` in the first group] **************************
skipping: [localhost] => (item=websrv1) 
ok: [localhost] => (item=websrv2)
skipping: [localhost] => (item=websrv3) 
ok: [localhost] => (item=websrv4)
ok: [localhost] => (item=websrv5)

TASK [Elect the other half of the controllers in the second group] ***********************************************
ok: [localhost] => (item=domaincontroller3)
ok: [localhost] => (item=domaincontroller4)
ok: [localhost] => (item=domaincontroller5)

TASK [Elect all the databases in the second group] ***************************************************************
ok: [localhost] => (item=sql1)
ok: [localhost] => (item=sql2)
ok: [localhost] => (item=sql3)
ok: [localhost] => (item=sql4)
ok: [localhost] => (item=sql5)

TASK [Elect all the web servers that do have a `sql_connection` in the second group] *****************************
ok: [localhost] => (item=websrv1)
skipping: [localhost] => (item=websrv2) 
ok: [localhost] => (item=websrv3)
skipping: [localhost] => (item=websrv4) 
skipping: [localhost] => (item=websrv5) 

PLAY [Patch execution] *******************************************************************************************

TASK [Display hosts targeted in the play] ************************************************************************
ok: [domaincontroller3] => 
  ansible_play_hosts_all:
  - domaincontroller3
  - domaincontroller4
  - domaincontroller5
  - sql1
  - sql2
  - sql3
  - sql4
  - sql5
  - websrv1
  - websrv3

PLAY RECAP *******************************************************************************************************
domaincontroller3          : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
localhost                  : ok=6    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
β.εηοιτ.βε
  • 33,893
  • 13
  • 69
  • 83
  • I guess you are my hero :D This sounds really good without being too complicated. Regarding your first sentence "Here is a, probably naive, approach to achieve this", this was the first solution i ever saw regarding this. Maybe bigger companys use Tower, or just reboot databases with active connections (i have very bad experiences about doing this). What do you think is the standard way how people would do this, with host_vars or a lot of extra groups? This looks definitely like it could work very well for us, so thanks again, i just want to get as much input as i can how others do it :D – EdFred Oct 29 '21 at 16:57
  • Well, your use case is quite peculiar, to be honest, you are not trying to do [blue/green deployment](https://martinfowler.com/bliki/BlueGreenDeployment.html), but just to logically group the web servers that are dependant of a sql connection or not. Big companies would, in my experience, have bigger problems that disallow them to have any downtime caused by a patching or a deployment, so they'll resort to techniques that able them to bring zero downtime, like blue/green deployment. – β.εηοιτ.βε Oct 29 '21 at 17:04