3

I am writing a program that will receive a string via stdin. The String will always have the root node <Exam>, and two child nodes: <Question> and <Answer>. What's a function that will validate that the XML is properly formatted (is not missing any tags or angled brackets)?

I've tried using etree but am running into errors:

def isProperlyFormattedXML():
    parser = etree.XMLParser(dtd_validation=True)
    schema_root = etree.XML('''\
        <Exam>
            <Question type="Short Response">
                What does OOP stand for?
            </Question>
            <Answer type="Short Response">
                "Object Oriented programming"
            </Answer>
        </Exam>
        ''')
    schema = etree.XMLSchema(schema_root)

    #Good xml
    parser = etree.XMLParser(schema = schema)
    try:
        root = etree.fromstring("<a>5</a>", parser)
        print ("Finished validating good xml")

        return True
    except lxml.etree.XMLSyntaxError as err:
        print (err)

    #Bad xml
    parser = etree.XMLParser(schema = schema)
    try:
        root = etree.fromstring("<a>5<b>foobar</b></a>", parser)
    except lxml.etree.XMLSyntaxError as err:
        print (err)
        return False

Error:

lxml.etree.XMLSchemaParseError: The XML document 'in_memory_buffer' is not a schema document.```
Vismark Juarez
  • 613
  • 4
  • 14
  • 21
  • 1
    There is a difference between well-formed and valid XML (see https://stackoverflow.com/a/25830482/407651). If `etree.XML()` does not emit any errors, the document is well-formed. The string with `Exam` as the root is an XML document, but it is not an XML Schema document. Unless you actually want to validate with XML Schema, there is no point in trying to use `etree.XMLSchema()`. – mzjn Jan 24 '20 at 07:23

1 Answers1

2

You have the solution already. You have to use try/except to check that.

JakobDev
  • 142
  • 1
  • 8