0

I'm fairly new to Python and coding in general so sorry if this is a very simple question. I'm working with the python packages XMLschema to validate some very large xml files. When I use the following code to get the error messages i only get the paths for the errors. This is okey with there are only 5-6 different "knude" but i have files which have 200+ of "knude" which makes this knowlegde very unusefull. I would there for like to the line number so I can go to the xml file and correct it.

Code:

    def get_validation_errors(xml_file, xsd_file):
        schema = xmlschema.XMLSchema(xsd_file)
        validation_error_iterator = schema.iter_errors(xml_file)
        errors = list()
        for idx, validation_error in enumerate(validation_error_iterator, start=1):
            err = validation_error.__str__()
            errors.append(err)
            print(f'[{idx}] path: {validation_error.path} | reason: {validation_error.reason} | message: {validation_error.message}')
        return errors

Results:

[1] path: /KnudeGroup/Knude[5]/StatusKode | reason: value must be one of [1, 2, 3, 4, 8] | message: failed validating 0 with XsdEnumerationFacets([1, 2, 3, 4, 8])

I have already tried reading the documentation and searched google and stackoverflow for an answer, but could not find any.

  • I think you can get a `sourceline` from the validation error but not if you use the Python built-in ElementTree (its parser doesn't expose line info), only if you use the lxml ElementTree. Hope that helps, if not, I will try to put together an example later. – Martin Honnen Apr 28 '23 at 09:50
  • I would really like an example. As I have tried to get it to work, but lxml won't parse my xml file, so i must be missing something. – Frederikke Kappelhøj Apr 28 '23 at 10:57
  • Before you commented you would like an example I had already posted an answer with sample (Python) code and it looks like you accepted that answer; therefore it is not quite clear what your comment requests in addition to the sample code in the answer. Please clarify or tell us exactly how parsing your XML with lxml failed. – Martin Honnen Apr 28 '23 at 11:52
  • I'm sorry, I got it to work,, then it didn't again, there for I asked for another example. Your answer worked perfectly after I found problem with the way i gave paths. I just forgot that I had asked for another example, I'm sorry. – Frederikke Kappelhøj May 03 '23 at 07:19

1 Answers1

0

Load the XML instance document with lxml, that way you have sourceline property on a validation error (https://github.com/sissaschool/xmlschema/blob/v2.2.3/xmlschema/validators/exceptions.py#L90) e.g. a minimal example would be

import lxml.etree as ET

from xmlschema import XMLSchema

xml_doc = ET.parse("sample1.xml")

schema = XMLSchema("sample1.xsd")

for error in schema.iter_errors(xml_doc):
    print(f'sourceline: {error.sourceline}; path: {error.path} | reason: {error.reason} | message: {error.message}')

and that way then outputs a line number as sourceline e.g. sourceline: 2; path: /root/item[1] | reason: invalid value 'a' for xs:decimal | message: failed validating 'a' with XsdAtomicBuiltin(name='xs:decimal').

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110