0

In my Ansible playbook there is a task which saves data into a csv at path /tmp/.

The generated csv has 4 columns: 'idm', 'idc', 'day', 'time'.

data example:

idm idc day         time
34  3   2023-02-09  12:57:34
56  6   2023-02-10  20:25:12
78  2   2023-02-11  04:01:00

Now, I would like to build a task to read the csv and edit it, so that I can order its columns like

ORDER BY day, time, idm, idc

and then overwrite the file.

How can I do it?

Note: If possible, I would like to do that by using python, so calling python from inside the playbook.

Here below is the code which generates the csv.

- name: Dump data into csv
  copy:
    dest: /tmp/data_{{ '%Y-%m-%d' | strftime }}.csv          

    content: |
      {{ ['idm', 'idc', 'day', 'time'] | map('trim') | join(';') }}
      {% for host in hosts_list %}
      {% set idm = host.inventory_hostname.split('_')[0].split('-')[1] %}
      {% set idm_padded = '%03d' % idm|int %}
      {% set idm_padded = '"' + idm_padded + '"' %}
      {% set idc = host.inventory_hostname.split('_')[1].upper() %}
      {% if host.grep_output.stdout_lines %}
      {% for line in host.grep_output.stdout_lines %}
      {% set day = line.strip().split(' ')[0] %}
      {% set time = line.strip().split(' ')[1] %}
      {{ [idm_padded, idc, day, time] | map('trim') | join(';') }}
      {% endfor %}
      {% endif %}
      {% endfor %}
      
  vars:
    hosts_list: "{{ ansible_play_hosts | map('extract', hostvars) | list }}"
  delegate_to: localhost
  register: csv_content
  run_once: yes

question update

I found out by reading this thread that csv can be ordered py using pure python like this

with open('unsorted.csv',newline='') as csvfile:
    spamreader = csv.DictReader(csvfile, delimiter=";")
    sortedlist = sorted(spamreader, key=lambda row:(row['column_1'],row['column_2']), reverse=False)


with open('sorted.csv', 'w') as f:
    fieldnames = ['column_1', 'column_2', column_3]
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()
    for row in sortedlist:
        writer.writerow(row)

My question is just: how do I write an ansible task that runs these operations on the saved file?

Tms91
  • 3,456
  • 6
  • 40
  • 74
  • Do you have sample data handy so that one can get an idea about the content and how do you like to order? As you writing out `idm, idc, day, time` but `ORDER BY day, time, idm, idc`, should the column order change too? – U880D Feb 09 '23 at 17:50
  • have you seen https://stackoverflow.com/questions/53799730/how-to-read-csv-file-data-in-ansible-playbook-using-with-lines – Oliver Gaida Feb 09 '23 at 18:05
  • 1
    i found an open issue for sorting by multiple attributes in jinja2 : https://github.com/jinja2cpp/Jinja2Cpp/issues/111 . But at the top of https://github.com/pallets/jinja/issues/670 there seems to be a workaround for this. – Oliver Gaida Feb 09 '23 at 18:11
  • 1
    I would to start implementation and testing in several steps. I would also consider to process the data sort outside and separate of Ansible, YAML and Jinja2. First implementing a fast prototype with `shell` module and Linux `sort`, see [Bash sort CSV file by first 4 columns](https://stackoverflow.com/questions/11934809/). After, moving it into a separate Python Custom Module, see [Python: sort CSV by column](https://stackoverflow.com/questions/2100353/). – U880D Feb 09 '23 at 18:14
  • 1
    Thanks for the update and providing the Python sort. "_How do I write an Ansible task that runs these operations on the saved file?_", as mentioned before, that would be just a simple [Custom Module](https://docs.ansible.com/ansible/latest/dev_guide/developing_modules_general.html#creating-a-module), maybe named `sort_csv.py` which would then be called with `sort_csv:` and at least a parameter `path:`. [Here an example](https://stackoverflow.com/a/75250253/6771046) which could be used as starting point. – U880D Feb 09 '23 at 19:29

2 Answers2

2

Given the CSV file for testing

shell> cat /tmp/data_today.csv 
idm idc day time
34 3 2023-02-09 12:57:34
34 3 2023-02-09 12:56:34
56 6 2023-02-10 20:25:12
56 5 2023-02-10 20:25:12
78 2 2023-02-11 04:01:00
77 2 2023-02-11 04:01:00
78 2 2023-02-08 04:01:00

Declare the variable

  csv_content: "{{ data_today_csv.list }}"

and read the file

    - read_csv:
        path: /tmp/data_today.csv
        delimiter: ' '
      register: data_today_csv

gives

  csv_content:
    - {day: '2023-02-09', idc: '3', idm: '34', time: '12:57:34'}
    - {day: '2023-02-09', idc: '3', idm: '34', time: '12:56:34'}
    - {day: '2023-02-10', idc: '6', idm: '56', time: '20:25:12'}
    - {day: '2023-02-10', idc: '5', idm: '56', time: '20:25:12'}
    - {day: '2023-02-11', idc: '2', idm: '78', time: '04:01:00'}
    - {day: '2023-02-11', idc: '2', idm: '77', time: '04:01:00'}
    - {day: '2023-02-08', idc: '2', idm: '78', time: '04:01:00'}

Declare the list order_by and create the index

  order_by: [day, time, idm, idc]
  index: |
    {% for i in csv_content %}
    - index: {{ order_by|map('extract', i)|join('_') }}
    {% endfor %}

gives

  index: |-
    - index: 2023-02-09_12:57:34_34_3
    - index: 2023-02-09_12:56:34_34_3
    - index: 2023-02-10_20:25:12_56_6
    - index: 2023-02-10_20:25:12_56_5
    - index: 2023-02-11_04:01:00_78_2
    - index: 2023-02-11_04:01:00_77_2
    - index: 2023-02-08_04:01:00_78_2

zip and combine the lists

  csv_index: "{{ csv_content|zip(index|from_yaml)|map('combine') }}"

gives

  csv_index:
    - {day: '2023-02-09', idc: '3', idm: '34', index: '2023-02-09_12:57:34_34_3', time: '12:57:34'}
    - {day: '2023-02-09', idc: '3', idm: '34', index: '2023-02-09_12:56:34_34_3', time: '12:56:34'}
    - {day: '2023-02-10', idc: '6', idm: '56', index: '2023-02-10_20:25:12_56_6', time: '20:25:12'}
    - {day: '2023-02-10', idc: '5', idm: '56', index: '2023-02-10_20:25:12_56_5', time: '20:25:12'}
    - {day: '2023-02-11', idc: '2', idm: '78', index: '2023-02-11_04:01:00_78_2', time: '04:01:00'}
    - {day: '2023-02-11', idc: '2', idm: '77', index: '2023-02-11_04:01:00_77_2', time: '04:01:00'}
    - {day: '2023-02-08', idc: '2', idm: '78', index: '2023-02-08_04:01:00_78_2', time: '04:01:00'}

sort by the attribute index

  csv_sort: "{{ csv_index|sort(attribute='index') }}"

gives

  csv_sort:
    - {day: '2023-02-08', idc: '2', idm: '78', index: '2023-02-08_04:01:00_78_2', time: '04:01:00'}
    - {day: '2023-02-09', idc: '3', idm: '34', index: '2023-02-09_12:56:34_34_3', time: '12:56:34'}
    - {day: '2023-02-09', idc: '3', idm: '34', index: '2023-02-09_12:57:34_34_3', time: '12:57:34'}
    - {day: '2023-02-10', idc: '5', idm: '56', index: '2023-02-10_20:25:12_56_5', time: '20:25:12'}
    - {day: '2023-02-10', idc: '6', idm: '56', index: '2023-02-10_20:25:12_56_6', time: '20:25:12'}
    - {day: '2023-02-11', idc: '2', idm: '77', index: '2023-02-11_04:01:00_77_2', time: '04:01:00'}
    - {day: '2023-02-11', idc: '2', idm: '78', index: '2023-02-11_04:01:00_78_2', time: '04:01:00'}

Use Jinja to write the sorted data

    - debug:
        msg: |
          {% for i in csv_sort %}
          {{ order_by|map('extract', i)|join(' ') }}
          {% endfor %}

gives

  msg: |-
    2023-02-08 04:01:00 78 2
    2023-02-09 12:56:34 34 3
    2023-02-09 12:57:34 34 3
    2023-02-10 20:25:12 56 5
    2023-02-10 20:25:12 56 6
    2023-02-11 04:01:00 77 2
    2023-02-11 04:01:00 78 2

This template works if all columns are in the list order_by. The last example would work also with a subset of columns in order_by.


Example of a complete playbook for testing

- hosts: localhost

  vars:

    order_by: [day, time, idm, idc]
    csv_content: "{{ data_today_csv.list }}"
    index: |
      {% for i in csv_content %}
      - index: {{ order_by|map('extract', i)|join('_') }}
      {% endfor %}
    csv_index: "{{ csv_content|zip(index|from_yaml)|map('combine') }}"
    csv_sort: "{{ csv_index|sort(attribute='index') }}"

  tasks:

    - read_csv:
        path: /tmp/data_today.csv
        delimiter: ' '
      register: data_today_csv
    - debug:
        var: csv_content|to_yaml

    - debug:
        var: index
    - debug:
        var: csv_index|to_yaml
    - debug:
        var: csv_sort|to_yaml
    - debug:
        msg: |
          {% for i in csv_sort %}
          {{ order_by|map('extract', i)|join(' ') }}
          {% endfor %}

Example of a playbook to write the sorted file

- hosts: localhost

  vars:

    delimiter: ' '
    csv_content: "{{ data_today_csv.list }}"
    order_by: [day, time, idm, idc]
    index: |
      {% for i in csv_content %}
      - index: {{ order_by|map('extract', i)|join('_') }}
      {% endfor %}
    csv_sort: "{{ csv_content|
                  zip(index|from_yaml)|
                  map('combine')|
                  sort(attribute='index') }}"

  tasks:

    - read_csv:
        path: /tmp/data_today.csv
        delimiter: "{{ delimiter }}"
      register: data_today_csv

    - copy:
        dest: /tmp/data_today_order.csv
        content: |
          {{ _keys|join(delimiter) }}
          {% for i in csv_sort %}
          {{ _keys|map('extract', i)|join(delimiter) }}
          {% endfor %}
      vars:
        _keys: "{{ csv_content.0.keys()|list }}"

gives

shell> cat /tmp/data_today_order.csv 
idm idc day time
78 2 2023-02-08 04:01:00
34 3 2023-02-09 12:56:34
34 3 2023-02-09 12:57:34
56 5 2023-02-10 20:25:12
56 6 2023-02-10 20:25:12
77 2 2023-02-11 04:01:00
78 2 2023-02-11 04:01:00
Vladimir Botka
  • 58,131
  • 4
  • 32
  • 63
1

By following @U880D suggestion in the comments, and by following this example, I have solved the problem by developing a custom ansible module.

The module name is ansible_custom_module__sort_csv.py and works like this:

Given in input

  • path: (string) the path of the csv file whose columns I want to be sorted in a specific order
  • columns_names: (list) the titles of the columns of the csv, in the order they are displayed in the csv from left to right
  • columns_to_order_by: (list) the titles of the columns of the csv, in the order I want the rows to be sorted

the module access the csv, read its content and then rewrites its content ordering the rows by columns_to_order_by.

In order to use this module, in my playbook I have added a task right afterwards the task Dump data into csv, this one:

- name: order the columns of the csv
  ansible_custom_module__sort_csv: 
    path: "/tmp/data_{{ '%Y-%m-%d' | strftime }}.csv"
    columns_names: "{{ ['idm', 'idc', 'day', 'time'] }}"
    columns_to_order_by: "{{ ['day', 'time', 'idm', 'idc'] }}"
  
  delegate_to: localhost
  register: csv_content
  run_once: yes
  register: result

Then, in the same directory where my playbook is, I have created a new directory named library,
and inside it I have placed my custom ansible module ansible_custom_module__sort_csv.py.

enter image description here

Here is its content:

#!/usr/bin/python

from __future__ import (absolute_import, division, print_function)
__metaclass__ = type

from ansible.module_utils.basic import AnsibleModule

# specific for csv and sorting 
import csv, operator


def run_module():
    module_args = dict(

    # inputs of the module
    #-----------------------

    # path of the csv file to sort
    path = dict(type='str', required=True),

    # list with the titles of all the columns of the csv
    columns_names = dict(type='list', required=True),

    # list with the titles the columns of the csv you want it ordered them by
    columns_to_order_by = dict(type='list', required=True),
    
    )

    # this is the output of the playbook for each host
    result = dict(
    changed=False
    )

    # options which state syntax rules for calling the module
    module = AnsibleModule(
    argument_spec=module_args,
    supports_check_mode=True
    )

    # # define the output of the playbook for each hosts, according to what happens into the play
    # result['changed'] = False

    # python is called on the target machine
    #---------------------------------------

    csv_file_path = module.params['path']
    fieldnames = module.params['columns_names']
    items = tuple ( module.params['columns_to_order_by'] )  

    with open(csv_file_path, 'r') as csvfile:
    spamreader = csv.DictReader(csvfile, delimiter=";")
    sortedlist = sorted(spamreader, key=operator.itemgetter(*items), reverse=False)

    with open(csv_file_path, 'w') as f:
    # titles of the new csv that gets generated with the sorted data coming from the previous one        
    writer = csv.DictWriter(f, fieldnames=fieldnames, delimiter=";")  # my excel reads ';' as column separator for CSVs
    writer.writeheader()
    for row in sortedlist:
        writer.writerow(row)

    # the csv gets overwritten, so there are changes on the host machine
    result['changed'] = True

    module.exit_json(**result)

def main():
    run_module()

if __name__ == '__main__':
    main()

And in the end I get the csv file with rows sorted as I wished.

Tms91
  • 3,456
  • 6
  • 40
  • 74