1

I am using voluptuous a lot to validate yaml description files. Often the errors are cumbersome to decipher, especially for regular users.

I am looking for a way to make the error a bit more readable. One way is to identify which line in the YAML file is incrimined.

from voluptuous import Schema 
import yaml 
from io import StringIO

Validate = Schema({
    'name': str,
    'age': int,
})

data = """
name: John
age: oops
"""

data = Validate(yaml.load(StringIO(data)))

In the above example, I get this error:

MultipleInvalid: expected int for dictionary value @ data['age']

I would rather prefer an error like:

Error: validation failed on line 2, data.age should be an integer.

Is there an elegant way to achieve this?

nowox
  • 25,978
  • 39
  • 143
  • 293

2 Answers2

1

The problem is that on the API boundary of yaml.load, all representational information of the source has been lost. Validate gets a Python dict and does not know where it originated from, and moreover the dict does not contain this information.

You can, however, implement this yourself. voluptuous' Invalid error carries a path which is a list of keys to follow. Having this path, you can parse the YAML again into nodes (which carry representation information) and discover the position of the item:

import yaml

def line_from(path, yaml_input):
  node = yaml.compose(yaml_input)
  for item in path:
    for entry in node.value:
      if entry[0].value == item:
        node = entry[1]
        break
    else: raise ValueError("unknown path element: " + item)
  return node.start_mark.line

# demostrating this on more complex input than yours

data = """
spam:
  egg:
    sausage:
      spam
"""

print(line_from(["spam", "egg", "sausage"], data))
# gives 4

Having this, you can then do

try:
  data = Validate(yaml.load(StringIO(data)))
except Invalid as e:
  line = line_from(e.path, data)
  path = "data." + ".".join(e.path)
  print(f"Error: validation failed on line {line} ({path}): {e.error_message}")

I'll go this far for this answer as it shows you how to discover the origin line of an error. You will probably need to extend this to:

  • handle YAML sequences (my code assumes that every intermediate node is a MappingNode, a SequenceNode will have single nodes in its value list instead of a key-value tuple)
  • handle MultipleInvalid to issue a message for each inner error
  • rewrite expected int to should be an integer if you really want to (no idea how you'd do that)
  • abort after printing the error
flyx
  • 35,506
  • 7
  • 89
  • 126
  • Thanks for this help. I searched a bit and found `ruamel.yaml` which keeps the line numbers : `u._yaml_line_col`. I will play around and get back to this question. – nowox Feb 07 '22 at 16:21
  • I'll give you the answer if you use my answer/edited answer as yours :) – nowox Feb 07 '22 at 16:42
  • 1
    @nowox In the spirit of SO, just accept your answer and I'll leave mine as-is. It's better for other people to have both a PyYAML and a ruamel solution available, since not everyone is free to change the YAML implementation they use. And I don't need the points ;) – flyx Feb 07 '22 at 17:20
1

With the help of flyx I found ruamel.yaml which provide the line and col of a parsed YAML file. So one can manage to get the wanted error with:

from voluptuous import Schema 
from ruamel.yaml import load, RoundTripLoader
from io import StringIO

Validate = Schema({
    'name': {
        'firstname': str,
        'lastname': str
    },
    'age': int,
})

data = """
name: 
    firstname: John
    lastname: 12.0
age: 42
"""

class Validate:
    def __init__(self, stream):
        self._yaml = load(stream, Loader=RoundTripLoader)
        return self.validate()

    def validate(self):
        try:
            self.data = Criteria(self._yaml)
        except Invalid as e:
            node = self._yaml
            for key in e.path:
                if (hasattr(node[key], '_yaml_line_col')):
                    node = node[key]
                else:
                    break
            path = '/'.join(e.path)
            print(f"Error: validation failed on line {node._yaml_line_col.line}:{node._yaml_line_col.col} (/{path}): {e.error_message}")
        else:
            return self.data
        
data = Validate(StringIO(data))

With this I get this error message:

Error: validation failed on line 2:4 (/name): extra keys not allowed
nowox
  • 25,978
  • 39
  • 143
  • 293