0

I am learning how to use an XML Schema document to generate an XML file which complies with said schema.

I have learned about PyXB to generate Python bindings for data structures defined by an XML Schema.

This SO post presents an "end-to-end" example of how to go about generating an XML file which complies with a given XML Schema. In order to provide information, the post helps digest information in these two links:

Both of these links present a decent explanation of how to use the bindings to generate the required XML file.

Here's the XSD file used:

  <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:element name="purchaseOrder" type="PurchaseOrderType"/>
  <xsd:element name="comment" type="xsd:string"/>
  <xsd:complexType name="PurchaseOrderType">
    <xsd:sequence>
      <xsd:element name="shipTo" type="USAddress"/>
      <xsd:element name="billTo" type="USAddress"/>
      <xsd:element ref="comment" minOccurs="0"/>
      <xsd:element name="items"  type="Items"/>
    </xsd:sequence>
    <xsd:attribute name="orderDate" type="xsd:date"/>
  </xsd:complexType>
  <xsd:complexType name="USAddress">
    <xsd:sequence>
      <xsd:element name="name"   type="xsd:string"/>
      <xsd:element name="street" type="xsd:string"/>
      <xsd:element name="city"   type="xsd:string"/>
      <xsd:element name="state"  type="xsd:string"/>
      <xsd:element name="zip"    type="xsd:decimal"/>
    </xsd:sequence>
    <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>
  </xsd:complexType>
  <xsd:complexType name="Items">
    <xsd:sequence>
      <xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name="productName" type="xsd:string"/>
            <xsd:element name="quantity">
              <xsd:simpleType>
                <xsd:restriction base="xsd:positiveInteger">
                  <xsd:maxExclusive value="100"/>
                </xsd:restriction>
              </xsd:simpleType>
            </xsd:element>
            <xsd:element name="USPrice"  type="xsd:decimal"/>
            <xsd:element ref="comment"   minOccurs="0"/>
            <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
          </xsd:sequence>
          <xsd:attribute name="partNum" type="SKU" use="required"/>
        </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
  </xsd:complexType>
  <!-- Stock Keeping Unit, a code for identifying products -->
  <xsd:simpleType name="SKU">
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="\d{3}-[A-Z]{2}"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:schema>

Here's the command I use to generate the Python bindings:

pyxbgen -u po1.xsd -m po1

This command generates a po1.py file which can then be used to generate an XML document as follows (content of demo2.py):

from __future__ import print_function
import po1 as address

addr = address.USAddress()
addr.name = 'Robert Smith'
addr.street = '8 Oak Avenue'
addr.city = 'Anytown'
addr.state = 'AK'
addr.zip = 12341

with open('demo2.xml', 'w') as f:
    f.write(addr.toxml("utf-8", element_name='USAddress').decode('utf-8'))

Executing this code yields the following well-formed XML file (demo2.xml):

<?xml version="1.0" encoding="utf-8"?>
<USAddress>
    <name>Robert Smith</name>
    <street>8 Oak Avenue</street>
    <city>Anytown</city>
    <state>AK</state>
    <zip>12341.0</zip>
</USAddress>

Actual questions:

  1. How can I automate the validation of the generated XML file against the used XSD Schema?
  2. Why (using this validator) does the generated XML file (demo2.xml) fail to be validated against the XML Schema? Specifically, I get the following error: Not valid. Error - Line 1, 50: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 50; cvc-elt.1.a: Cannot find the declaration of element 'USAddress'.
  3. What modifications should I consider adding to demo2.py to produce XML files that do pass the validation against the Schema?
Eduardo
  • 697
  • 8
  • 26

1 Answers1

0

After doing additional reading, I realized that using the Python bindings does NOT guarantee the generated XML document complies with the schema.

Answer 2 and part of 3:

In this case, the validation fails because the Python code is indeed constructing a non-valid XML document with respect to the schema given. Consider a much simpler XSD-XML pair of files:

Simpler XSD:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="a" type="AType"/>
<xsd:complexType name="AType">
  <xsd:sequence>
    <xsd:element name="b" type="xsd:string" />
  </xsd:sequence>
</xsd:complexType>
</xsd:schema>

Simpler Python code using bindings generated for the schema:

from __future__ import print_function
import po2

hey = po2.AType()
hey.b = "Eduardo"

with open('demo4.xml', 'w') as f:
    f.write(hey.toxml("utf-8", element_name='a').decode('utf-8'))

When executing the Python code, this XML is generated:

<?xml version="1.0" encoding="utf-8"?>
<a><b>Eduardo</b></a>

In the original case (USAddress), the original schema presented a root type for the DOM that is not the USAddress type. Yet, the Python code neglected this and went straight ahead to generate a XML file only using an instance of this type. The generated XML is indeed well-formed, yet it does not comply with the schema. In the much simpler case here depicted, not only is the XML valid, but using Python bindings also yields an XML file that complies with the XSD schema.

Answer 1:

You can automate the validation of the generated file against a given XSD schema using the explanation provided here.

from lxml import etree
from StringIO import StringIO

# Check https://lxml.de/validation.html

f = StringIO('''\
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="a" type="AType"/>
<xsd:complexType name="AType">
  <xsd:sequence>
    <xsd:element name="b" type="xsd:string" />
  </xsd:sequence>
</xsd:complexType>
</xsd:schema>
''')
xmlschema_doc = etree.parse(f)
xmlschema = etree.XMLSchema(xmlschema_doc)

valid = StringIO('''\
<?xml version="1.0" encoding="utf-8"?>
<a><b>Eduardo</b></a>
''')
doc = etree.parse(valid)

print('Is XML is valid?', xmlschema.validate(doc))
Eduardo
  • 697
  • 8
  • 26