18

I have a XML file and I have a XML schema. I want to validate the file against that schema and check if it adheres to that. I am using python but am open to any language for that matter if there is no such useful library in python.

What would be my best options here? I would worry about the how fast I can get this up and running.

Scooby
  • 3,371
  • 8
  • 44
  • 84

3 Answers3

26

Definitely lxml.

Define an XMLParser with a predefined schema, load the the file fromstring() and catch any XML Schema errors:

from lxml import etree

def validate(xmlparser, xmlfilename):
    try:
        with open(xmlfilename, 'r') as f:
            etree.fromstring(f.read(), xmlparser) 
        return True
    except etree.XMLSchemaError:
        return False

schema_file = 'schema.xsd'
with open(schema_file, 'r') as f:
    schema_root = etree.XML(f.read())

schema = etree.XMLSchema(schema_root)
xmlparser = etree.XMLParser(schema=schema)

filenames = ['input1.xml', 'input2.xml', 'input3.xml']
for filename in filenames:
    if validate(xmlparser, filename):
        print("%s validates" % filename)
    else:
        print("%s doesn't validate" % filename)

Note about encoding

If the schema file contains an xml tag with an encoding (e.g. <?xml version="1.0" encoding="UTF-8"?>), the code above will generate the following error:

Traceback (most recent call last):
  File "<input>", line 2, in <module>
    schema_root = etree.XML(f.read())
  File "src/lxml/etree.pyx", line 3192, in lxml.etree.XML
  File "src/lxml/parser.pxi", line 1872, in lxml.etree._parseMemoryDocument
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

A solution is to open the files in byte mode: open(..., 'rb')

[...]
def validate(xmlparser, xmlfilename):
    try:
        with open(xmlfilename, 'rb') as f:
[...]
with open(schema_file, 'rb') as f:
[...]
Thomas BDX
  • 2,632
  • 2
  • 27
  • 31
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • It does work, yes. Is there a brief tutorial on it ? I passed the schema and feed file and it took both and processed them. How would I know if it got validated or not ? – Scooby Jul 23 '13 at 20:18
  • It's simple. `etree.fromstring` will throw an exception if the xml file doesn't validate. – alecxe Jul 23 '13 at 20:21
  • wow, that was quick. Now the thing is I would want to read multiple xml feeds and validate them against the schema. So I could just loop them up through fromstring ? 1. Would on an exception it just stop processing and ignore other feeds? I would want to process all the feed files and then if possible give an error as to where it failed or did not validate. 2. Also, it feed might have many record, is there any way to run all of them and divide them on the basis of passing or failing the validation. – Scooby Jul 23 '13 at 20:26
  • I've updated the code assuming the schema for all xmls is the same - though I think you've got the idea anyway. Please, check. – alecxe Jul 23 '13 at 20:32
  • Works like a charm. Will play with it more ! Thanks. – Scooby Jul 23 '13 at 20:37
  • Just one more general question : This just validates the structure or also the permissible values in fields? Also in case I get an error is there a way to get more personalized error , as to where exactly did it fail ? – Scooby Jul 23 '13 at 20:41
  • It should validates permissible values too. And, yes, lxml tells you there exactly is an error - just print the traceback. – alecxe Jul 23 '13 at 21:31
  • I have two files on my schema, one referencing the other. How should I proceed? – fiatjaf Nov 21 '13 at 04:22
  • Super!! It just worked as it is...and serve the complete purpose.. Thanks a lot @alecxe – Bhupesh Pant Feb 13 '14 at 10:20
  • You probably want to restrict the list of caught exceptions. This will return False if the file does not exist - which might be difficult to debug. – charlax May 27 '15 at 17:18
  • @charlax updated, hope you can test it and confirm it is working as expected. Thanks. – alecxe May 27 '15 at 17:26
  • @alecxe It works great for python 2.7. I'm trying to validate the same way in python 3.4. I'm not successful. Is there a way to achieve XSD validation in xml.etree.ElementTree package? – Satish Jonnala Jul 07 '15 at 03:49
  • @SatishJonnala consider making a separate question if you have difficulties with python3.4 specific solution. Throw me a link here. Thanks. – alecxe Jul 08 '15 at 01:26
  • @alecxe http://stackoverflow.com/questions/31273430/python-3-4-how-to-do-xml-validation/ – Satish Jonnala Jul 08 '15 at 16:00
  • Also, this may hang or take additional time if retrieving schemas from the internet. Consider using a catalog: https://blog.frankel.ch/use-local-resources-when-validating-xml/ http://www.xmlsoft.org/catalog.html – Thomas BDX Sep 20 '18 at 15:18
3

The python snippet is good, but an alternative is to use xmllint:

xmllint -schema sample.xsd --noout sample.xml
daparic
  • 3,794
  • 2
  • 36
  • 38
  • Just found this googling the same issue--I like this over installing another XML library (I'm using the built-in xml.etree module to generate the XML). – nrlakin Apr 21 '17 at 21:23
  • It takes forever for me to download the schema from oasis, if it hangs or take an extra long time, consider using a catalog: https://blog.frankel.ch/use-local-resources-when-validating-xml/ http://www.xmlsoft.org/catalog.html – Thomas BDX Sep 20 '18 at 15:19
0
import xmlschema


def get_validation_errors(xml_file, xsd_file):
    schema = xmlschema.XMLSchema(xsd_file)
    validation_error_iterator = schema.iter_errors(xml_file)
    errors = list()
    for idx, validation_error in enumerate(validation_error_iterator, start=1):
        err = validation_error.__str__()
        errors.append(err)
        print(err)
    return errors

Vijay Anand Pandian
  • 1,027
  • 11
  • 23