0

I am trying to write Python code that checks if indentation within a YAML file is indented correctly and flags an error if any inconsistencies exist.

For example, the second occurrence of the key-value pair mapping "class" has 4 spaces before it when it should instead have 6 spaces (like the first occurrence).

I have dozens of these YAML files with thousands of entries. So, I need an automated way to check if the indentation is inaccurate.

How could I achieve this within Python?

students:
  incoming:
  - enrolled: TRUE
    semester: final
    destination:
      name:
      - John Walsh
      - Heather Dunbar
      class:
      - 1258
  - enrolled: TRUE
    semester: final
    destination:
      name:
      - Alfred Flynn
      - Joe Diaz      
    class: ## incorrectly indented entry.
      - 3662

Here's my code:

class_indentation = "      class:"

with open("yamls/students.yaml", "r") as file:
    for line_number, line in enumerate(file, start=1):  
        if class_indentation in line:
          print(f"Indentation for '{class_indentation}' is valid: {line_number}")
          break
        else:
          print(f"Indentation for '{class_indentation}' is NOT valid: {line_number}")
print("Search completed.")
martineau
  • 119,623
  • 25
  • 170
  • 301
Kerbol
  • 588
  • 2
  • 9
  • 24
  • What about using a YAML parser and testing if the file has a separate `class` element which is sibling but not child to `destination` ? As far as I can see, the relative indentation to its valid parent-element (here: `destination`) is important for validating the indentation-space. – hc_dev Feb 09 '22 at 11:37
  • Does this answer your question? [Validating a yaml document in python](https://stackoverflow.com/questions/3262569/validating-a-yaml-document-in-python) – possum Feb 09 '22 at 12:04

1 Answers1

2

TL;DR: use a YAML parser and test valid nesting: if the class node is child of destination.

YAML parsers

There are 2 major YAML parsers for Python:

  • ruamel.yaml, preserves more of the original (like comments, ordering, etc.)
  • pyyaml, which can be seen as the predecessor to ruamel.yaml

For simplicity I will use pyyaml below.

Using pyyaml to test valid nesting

I found How to parse deeply nested yaml data structures in python and reused the answered functions here.

Below code looks for invalid indentation, if a class element is not child of destination:

import yaml

yaml_text = '''
students:
  incoming:
  - enrolled: TRUE
    semester: final
    destination:
      name:
      - John Walsh
      - Heather Dunbar
      class:
      - 1258
  - enrolled: TRUE
    semester: final
    destination:
      name:
      - Alfred Flynn
      - Joe Diaz      
    class: ## incorrectly indented entry.
      - 3662
'''

def lookup(sk, d, path=[]):
   # lookup the values for key(s) sk return as list the tuple (path to the value, value)
   if isinstance(d, dict):
       for k, v in d.items():
           if k == sk:
               yield (path + [k], v)
           for res in lookup(sk, v, path + [k]):
               yield res
   elif isinstance(d, list):
       for item in d:
           for res in lookup(sk, item, path + [item]):
               yield res


tree_dict = yaml.safe_load(yaml_text)
for (segments, value) in lookup("class", tree_dict):
    if segments[-2] != 'destination':
        print("Invalid indentation!  Not child of 'destination':")
    else:
       print("OK:")
    print(f"\tpath-segments: {segments}\n\tvalue: {value}")

Prints:

OK:
    path-segments: ['students', 'incoming', {'enrolled': True, 'semester': 'final', 'destination': {'name': ['John Walsh', 'Heather Dunbar'], 'class': [1258]}}, 'destination', 'class']
    value: [1258]
Invalid indentation!  Not child of 'destination':
    path-segments: ['students', 'incoming', {'enrolled': True, 'semester': 'final', 'destination': {'name': ['Alfred Flynn', 'Joe Diaz']}, 'class': [3662]}, 'class']
    value: [3662]

Using ruamel.yaml to test valid nesting

You can also adapt to ruamel.yaml without loss of functionality. Simply change import and loading:

# import yaml
from ruamel.yaml import YAML

# tree_dict = yaml.safe_load(yaml_text)
tree_dict = YAML(typ='safe').load(yaml_text)

Alternative: validate YAML using a schema

Alternatively you can also validate your YAML files against a schema. For example using JSON-schame since YAML can be seen as superset to JSON.

See Validating a yaml document in python for more.

hc_dev
  • 8,389
  • 1
  • 26
  • 38
  • Thanks. However, I would like to keep the input as formatted initially. So, I need to use ruamel.yaml instead, as it sounds like this will scan over my yaml and validate whether or not the formatting is right or wrong and report an error. How would I practically implement such a check using ruamel.yaml? – Kerbol Feb 09 '22 at 14:50
  • 1
    @Kerbol The input will not change. But ruamel simply has more features (e.g. allows to read/modify comments which usually don't matter; ordering of entries preserved, etc.). I clarified that again and added the changes need to use it instead of pyyaml. See my update. – hc_dev Feb 09 '22 at 16:12
  • Is there a way to use ruamel and simply check IF the indentation matches the following condition: "yml.indent(mapping=2, sequence=4, offset=2)" and then flag True/False based on this? I am just looking for a really simple solution as my Python knowledge is not so great. thanks – Kerbol Feb 11 '22 at 16:47