Ansible: How to access a stored csv, order its rows by certain columns and then save it

Question

In my Ansible playbook there is a task which saves data into a csv at path /tmp/.

The generated csv has 4 columns: 'idm', 'idc', 'day', 'time'.

data example:

idm idc day         time
34  3   2023-02-09  12:57:34
56  6   2023-02-10  20:25:12
78  2   2023-02-11  04:01:00

Now, I would like to build a task to read the csv and edit it, so that I can order its columns like

ORDER BY day, time, idm, idc

and then overwrite the file.

How can I do it?

Note: If possible, I would like to do that by using python, so calling python from inside the playbook.

Here below is the code which generates the csv.

- name: Dump data into csv
  copy:
    dest: /tmp/data_{{ '%Y-%m-%d' | strftime }}.csv          

    content: |
      {{ ['idm', 'idc', 'day', 'time'] | map('trim') | join(';') }}
      {% for host in hosts_list %}
      {% set idm = host.inventory_hostname.split('_')[0].split('-')[1] %}
      {% set idm_padded = '%03d' % idm|int %}
      {% set idm_padded = '"' + idm_padded + '"' %}
      {% set idc = host.inventory_hostname.split('_')[1].upper() %}
      {% if host.grep_output.stdout_lines %}
      {% for line in host.grep_output.stdout_lines %}
      {% set day = line.strip().split(' ')[0] %}
      {% set time = line.strip().split(' ')[1] %}
      {{ [idm_padded, idc, day, time] | map('trim') | join(';') }}
      {% endfor %}
      {% endif %}
      {% endfor %}
      
  vars:
    hosts_list: "{{ ansible_play_hosts | map('extract', hostvars) | list }}"
  delegate_to: localhost
  register: csv_content
  run_once: yes

question update

I found out by reading this thread that csv can be ordered py using pure python like this

with open('unsorted.csv',newline='') as csvfile:
    spamreader = csv.DictReader(csvfile, delimiter=";")
    sortedlist = sorted(spamreader, key=lambda row:(row['column_1'],row['column_2']), reverse=False)


with open('sorted.csv', 'w') as f:
    fieldnames = ['column_1', 'column_2', column_3]
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()
    for row in sortedlist:
        writer.writerow(row)

My question is just: how do I write an ansible task that runs these operations on the saved file?

Do you have sample data handy so that one can get an idea about the content and how do you like to order? As you writing out `idm, idc, day, time` but `ORDER BY day, time, idm, idc`, should the column order change too? — U880D, Feb 09 '23 at 17:50
have you seen https://stackoverflow.com/questions/53799730/how-to-read-csv-file-data-in-ansible-playbook-using-with-lines — Oliver Gaida, Feb 09 '23 at 18:05
i found an open issue for sorting by multiple attributes in jinja2 : https://github.com/jinja2cpp/Jinja2Cpp/issues/111 . But at the top of https://github.com/pallets/jinja/issues/670 there seems to be a workaround for this. — Oliver Gaida, Feb 09 '23 at 18:11
I would to start implementation and testing in several steps. I would also consider to process the data sort outside and separate of Ansible, YAML and Jinja2. First implementing a fast prototype with `shell` module and Linux `sort`, see [Bash sort CSV file by first 4 columns](https://stackoverflow.com/questions/11934809/). After, moving it into a separate Python Custom Module, see [Python: sort CSV by column](https://stackoverflow.com/questions/2100353/). — U880D, Feb 09 '23 at 18:14
Thanks for the update and providing the Python sort. "_How do I write an Ansible task that runs these operations on the saved file?_", as mentioned before, that would be just a simple [Custom Module](https://docs.ansible.com/ansible/latest/dev_guide/developing_modules_general.html#creating-a-module), maybe named `sort_csv.py` which would then be called with `sort_csv:` and at least a parameter `path:`. [Here an example](https://stackoverflow.com/a/75250253/6771046) which could be used as starting point. — U880D, Feb 09 '23 at 19:29

Vladimir Botka · Answer 1 · 2023-02-09T20:38:26.730

Given the CSV file for testing

shell> cat /tmp/data_today.csv 
idm idc day time
34 3 2023-02-09 12:57:34
34 3 2023-02-09 12:56:34
56 6 2023-02-10 20:25:12
56 5 2023-02-10 20:25:12
78 2 2023-02-11 04:01:00
77 2 2023-02-11 04:01:00
78 2 2023-02-08 04:01:00

Declare the variable

  csv_content: "{{ data_today_csv.list }}"

and read the file

    - read_csv:
        path: /tmp/data_today.csv
        delimiter: ' '
      register: data_today_csv

gives

  csv_content:
    - {day: '2023-02-09', idc: '3', idm: '34', time: '12:57:34'}
    - {day: '2023-02-09', idc: '3', idm: '34', time: '12:56:34'}
    - {day: '2023-02-10', idc: '6', idm: '56', time: '20:25:12'}
    - {day: '2023-02-10', idc: '5', idm: '56', time: '20:25:12'}
    - {day: '2023-02-11', idc: '2', idm: '78', time: '04:01:00'}
    - {day: '2023-02-11', idc: '2', idm: '77', time: '04:01:00'}
    - {day: '2023-02-08', idc: '2', idm: '78', time: '04:01:00'}

Declare the list order_by and create the index

  order_by: [day, time, idm, idc]
  index: |
    {% for i in csv_content %}
    - index: {{ order_by|map('extract', i)|join('_') }}
    {% endfor %}

gives

  index: |-
    - index: 2023-02-09_12:57:34_34_3
    - index: 2023-02-09_12:56:34_34_3
    - index: 2023-02-10_20:25:12_56_6
    - index: 2023-02-10_20:25:12_56_5
    - index: 2023-02-11_04:01:00_78_2
    - index: 2023-02-11_04:01:00_77_2
    - index: 2023-02-08_04:01:00_78_2

zip and combine the lists

  csv_index: "{{ csv_content|zip(index|from_yaml)|map('combine') }}"

gives

  csv_index:
    - {day: '2023-02-09', idc: '3', idm: '34', index: '2023-02-09_12:57:34_34_3', time: '12:57:34'}
    - {day: '2023-02-09', idc: '3', idm: '34', index: '2023-02-09_12:56:34_34_3', time: '12:56:34'}
    - {day: '2023-02-10', idc: '6', idm: '56', index: '2023-02-10_20:25:12_56_6', time: '20:25:12'}
    - {day: '2023-02-10', idc: '5', idm: '56', index: '2023-02-10_20:25:12_56_5', time: '20:25:12'}
    - {day: '2023-02-11', idc: '2', idm: '78', index: '2023-02-11_04:01:00_78_2', time: '04:01:00'}
    - {day: '2023-02-11', idc: '2', idm: '77', index: '2023-02-11_04:01:00_77_2', time: '04:01:00'}
    - {day: '2023-02-08', idc: '2', idm: '78', index: '2023-02-08_04:01:00_78_2', time: '04:01:00'}

sort by the attribute index

  csv_sort: "{{ csv_index|sort(attribute='index') }}"

gives

  csv_sort:
    - {day: '2023-02-08', idc: '2', idm: '78', index: '2023-02-08_04:01:00_78_2', time: '04:01:00'}
    - {day: '2023-02-09', idc: '3', idm: '34', index: '2023-02-09_12:56:34_34_3', time: '12:56:34'}
    - {day: '2023-02-09', idc: '3', idm: '34', index: '2023-02-09_12:57:34_34_3', time: '12:57:34'}
    - {day: '2023-02-10', idc: '5', idm: '56', index: '2023-02-10_20:25:12_56_5', time: '20:25:12'}
    - {day: '2023-02-10', idc: '6', idm: '56', index: '2023-02-10_20:25:12_56_6', time: '20:25:12'}
    - {day: '2023-02-11', idc: '2', idm: '77', index: '2023-02-11_04:01:00_77_2', time: '04:01:00'}
    - {day: '2023-02-11', idc: '2', idm: '78', index: '2023-02-11_04:01:00_78_2', time: '04:01:00'}

Use Jinja to write the sorted data

    - debug:
        msg: |
          {% for i in csv_sort %}
          {{ order_by|map('extract', i)|join(' ') }}
          {% endfor %}

gives

  msg: |-
    2023-02-08 04:01:00 78 2
    2023-02-09 12:56:34 34 3
    2023-02-09 12:57:34 34 3
    2023-02-10 20:25:12 56 5
    2023-02-10 20:25:12 56 6
    2023-02-11 04:01:00 77 2
    2023-02-11 04:01:00 78 2

This template works if all columns are in the list order_by. The last example would work also with a subset of columns in order_by.

^{Example of a complete playbook for testing
- hosts: localhost

vars:

order_by: [day, time, idm, idc]
csv_content: "{{ data_today_csv.list }}"
index: |
{% for i in csv_content %}
- index: {{ order_by|map('extract', i)|join('_') }}
{% endfor %}
csv_index: "{{ csv_content|zip(index|from_yaml)|map('combine') }}"
csv_sort: "{{ csv_index|sort(attribute='index') }}"

tasks:

- read_csv:
path: /tmp/data_today.csv
delimiter: ' '
register: data_today_csv
- debug:
var: csv_content|to_yaml

- debug:
var: index
- debug:
var: csv_index|to_yaml
- debug:
var: csv_sort|to_yaml
- debug:
msg: |
{% for i in csv_sort %}
{{ order_by|map('extract', i)|join(' ') }}
{% endfor %}

Example of a playbook to write the sorted file
- hosts: localhost

vars:

delimiter: ' '
csv_content: "{{ data_today_csv.list }}"
order_by: [day, time, idm, idc]
index: |
{% for i in csv_content %}
- index: {{ order_by|map('extract', i)|join('_') }}
{% endfor %}
csv_sort: "{{ csv_content|
zip(index|from_yaml)|
map('combine')|
sort(attribute='index') }}"

tasks:

- read_csv:
path: /tmp/data_today.csv
delimiter: "{{ delimiter }}"
register: data_today_csv

- copy:
dest: /tmp/data_today_order.csv
content: |
{{ _keys|join(delimiter) }}
{% for i in csv_sort %}
{{ _keys|map('extract', i)|join(delimiter) }}
{% endfor %}
vars:
_keys: "{{ csv_content.0.keys()|list }}"

gives
shell> cat /tmp/data_today_order.csv
idm idc day time
78 2 2023-02-08 04:01:00
34 3 2023-02-09 12:56:34
34 3 2023-02-09 12:57:34
56 5 2023-02-10 20:25:12
56 6 2023-02-10 20:25:12
77 2 2023-02-11 04:01:00
78 2 2023-02-11 04:01:00}

Tms91 · Accepted Answer · 2023-02-10T17:31:52.773

By following @U880D suggestion in the comments, and by following this example, I have solved the problem by developing a custom ansible module.

The module name is ansible_custom_module__sort_csv.py and works like this:

Given in input

path: (string) the path of the csv file whose columns I want to be sorted in a specific order
columns_names: (list) the titles of the columns of the csv, in the order they are displayed in the csv from left to right
columns_to_order_by: (list) the titles of the columns of the csv, in the order I want the rows to be sorted

the module access the csv, read its content and then rewrites its content ordering the rows by columns_to_order_by.

In order to use this module, in my playbook I have added a task right afterwards the task Dump data into csv, this one:

- name: order the columns of the csv
  ansible_custom_module__sort_csv: 
    path: "/tmp/data_{{ '%Y-%m-%d' | strftime }}.csv"
    columns_names: "{{ ['idm', 'idc', 'day', 'time'] }}"
    columns_to_order_by: "{{ ['day', 'time', 'idm', 'idc'] }}"
  
  delegate_to: localhost
  register: csv_content
  run_once: yes
  register: result

Then, in the same directory where my playbook is, I have created a new directory named library,
and inside it I have placed my custom ansible module ansible_custom_module__sort_csv.py.

Here is its content:

#!/usr/bin/python

from __future__ import (absolute_import, division, print_function)
__metaclass__ = type

from ansible.module_utils.basic import AnsibleModule

# specific for csv and sorting 
import csv, operator


def run_module():
    module_args = dict(

    # inputs of the module
    #-----------------------

    # path of the csv file to sort
    path = dict(type='str', required=True),

    # list with the titles of all the columns of the csv
    columns_names = dict(type='list', required=True),

    # list with the titles the columns of the csv you want it ordered them by
    columns_to_order_by = dict(type='list', required=True),
    
    )

    # this is the output of the playbook for each host
    result = dict(
    changed=False
    )

    # options which state syntax rules for calling the module
    module = AnsibleModule(
    argument_spec=module_args,
    supports_check_mode=True
    )

    # # define the output of the playbook for each hosts, according to what happens into the play
    # result['changed'] = False

    # python is called on the target machine
    #---------------------------------------

    csv_file_path = module.params['path']
    fieldnames = module.params['columns_names']
    items = tuple ( module.params['columns_to_order_by'] )  

    with open(csv_file_path, 'r') as csvfile:
    spamreader = csv.DictReader(csvfile, delimiter=";")
    sortedlist = sorted(spamreader, key=operator.itemgetter(*items), reverse=False)

    with open(csv_file_path, 'w') as f:
    # titles of the new csv that gets generated with the sorted data coming from the previous one        
    writer = csv.DictWriter(f, fieldnames=fieldnames, delimiter=";")  # my excel reads ';' as column separator for CSVs
    writer.writeheader()
    for row in sortedlist:
        writer.writerow(row)

    # the csv gets overwritten, so there are changes on the host machine
    result['changed'] = True

    module.exit_json(**result)

def main():
    run_module()

if __name__ == '__main__':
    main()

And in the end I get the csv file with rows sorted as I wished.

Ansible: How to access a stored csv, order its rows by certain columns and then save it

question update

2 Answers2