You should IMO not assume that your program name is as simple as the ones you have. If a project name becomes long the program that dumped the YAML might have wrapped the scalar string value for project
over multiple lines. If the name includes special characters (for YAML) the program that dumped the name will have used single or double quotes around scalar string. In addition the -
might be on the line where you have the key project
and the value for the key project
doesn't have to be on the same line:
- project:
presentations/demo1
description: Some description for demo1 project
A YAML parser will automatically reconstruct such a scalar correctly, something that is very difficult to get right using anything else but YAML parser.
Fortunately it easy to check what you want in Python using a YAML parser:
import ruamel.yaml
with open('input.yaml') as fp:
data = ruamel.yaml.safe_load(fp)
for idx, d in enumerate(data[:-1]):
assert d['project'] < data[idx+1]['project']
If you can have projects with the same name, you should be using <=
instead of <
. You will have to install ruamel.yaml in your virtualenv (you are using one for development for sure) using pip install ruamel.yaml
.
If you don't just want to check the YAML, but generate a correctly ordered one you should use:
import ruamel.yaml
with open('input.yaml') as fp:
data = ruamel.yaml.round_trip_load(fp)
ordered = True
for idx, d in enumerate(data[:-1]):
if d['project'] > data[idx+1]['project']:
ordered = False
if not ordered:
project_data_map = {}
for d in data:
project_data_map.setdefault(d['project'], []).append(d)
out_data = []
for project_name in sorted(project_data_map):
out_data.extend(project_data_map[project_name])
with open('output.yaml', 'w') as fp:
ruamel.yaml.round_trip_dump(out_data, fp)
This will preserve the order of the keys in the individual mappings/dicts, preserve any comments.
The setdefault().append()
handles any project names that might be double/repeated in the input as seperate entries. So you will have the same amount of projects in the output as the input even if the project names of some might be the same.