0

I am using the following to verify a large 5GB XML document:

from lxml.etree fromstring, XMLSchema

xmlschema = XMLSchema(xmlschema_doc)
root = fromstring(open(myfilepath).read())
xmlschema.assertValid(root)

However, I'm starting to hit out of memory errors:

OSError: [Errno 12] Cannot allocate memory

Is there a 'on-the-fly' way to do xsd validation in xml without having to load everything into memory? If so, how would I do that?

  • Does it need to be in python? Otherwise see here: https://stackoverflow.com/questions/7528249/how-to-validate-very-large-xml-files – liamhawkins Feb 21 '19 at 20:32
  • You're looking for a python sax library – pguardiario Feb 21 '19 at 22:55
  • 1
    I don't know what's available in the Python world, but in the Java world Xerces and Saxon can both do on-the-fly validation. (However, with 5GB, even the space occupied by the indexes needed to do ID/key/unique/keyref validation can consume quite a bit of memory, depending of course on the schema). – Michael Kay Feb 22 '19 at 08:43

0 Answers0