1

I'm trying to parse custom XML file formats with PyXB. So, I first wrote the following XML schema:

<?xml version="1.0"?>                                                           
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">                         
    <xs:element name="outertag" minOccurs="0" maxOccurs="1">                    
        <xs:complexType>                                                        
            <xs:all>                                                            
                <xs:element name="innertag0"                                    
                            minOccurs="0"                                       
                            maxOccurs="unbounded"/>                             
                <xs:element name="innertag1"                                    
                            minOccurs="0"                                       
                            maxOccurs="unbounded"/>                             
            </xs:all>                                                           
        </xs:complexType>                                                       
    </xs:element>                                                               
</xs:schema>

I used the following pyxbgen command to generate the Python module's source, py_schema_module.py:

pyxbgen -m py_schema_module -u schema.xsd

I then wrote the following script for parsing an XML file I call example.xml:

#!/usr/bin/env python2.7                                                        

import py_schema_module                                                         

if __name__ == "__main__":                                                      
    with open("example.xml", "r") as f:                                         
        py_schema_module.CreateFromDocument(f.read())

I use that script to determine the legality of example.xml's syntax. For instance, the following example.xml file has legal syntax per the schema:

<outertag>                                                                      
    <innertag0></innertag0>                                                     
    <innertag1></innertag1>                                                     
</outertag>

So does this:

<outertag>                                                                      
    <innertag1></innertag1>                                                     
    <innertag0></innertag0>                                                     
</outertag>

However, the following syntax is illegal:

<outertag>                                                                      
    <innertag1></innertag1>                                                     
    <innertag0></innertag0>                                                     
    <innertag1></innertag1>                                                     
</outertag>

So is this:

<outertag>                                                                      
    <innertag0></innertag0>                                                     
    <innertag1></innertag1>                                                     
    <innertag0></innertag0>                                                     
</outertag>

I am able to write innertag0 and then innertag1. I am also able to write innertag1 and then innertag0. I can also repeat the instances of innertag0 and innertag1 arbitrarily (examples not shown for the sake of brevity). However, what I cannot do is switch between innertag0 and innertag1.

Let's assume I want the format to support this functionality. How should I alter my XML schema file?

  • (1) It is not clear what XML is valid, and what XML is invalid. Please clarify. (2) Are you using XSD 1.0 or 1.1? – Yitzhak Khabinsky Mar 02 '20 at 20:15
  • (1) There are four codeblocks at the end. The first two are valid. The final two are invalid. – xmlschemaquestion Mar 02 '20 at 22:16
  • (2) I am not sure what version pyxbgen is using under the hood. How does the answer change for either two? – xmlschemaquestion Mar 02 '20 at 22:16
  • Complex types can have different kinds of content. Those that allow child elements necessarily have one of , , or as their content models. XML Schema 1.1 supports co-constraints natively. The newly introduced element can include conditions specified in XPath 2.0 – Yitzhak Khabinsky Mar 02 '20 at 22:23

2 Answers2

0

The following XML Schema (XSD) 1.0 should cover your use case regardless of the sequential order of the innertag(0|1) element. Default value for both minOccurs and maxOccurs is 1.

Useful link: XML schema, why xs:group can't be child of xs:all?

XML

<outertag>
    <innertag1></innertag1>
    <innertag0></innertag0>
</outertag>

XSD

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
    <xs:element name="outertag">
        <xs:complexType>
            <xs:all>
                <xs:element name="innertag0" type="xs:string"/>
                <xs:element name="innertag1" type="xs:string"/>
            </xs:all>
        </xs:complexType>
    </xs:element>
</xs:schema>
Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21
  • Thanks for your response! However, the issue is not that I can't reorder innertag0 and innertag1. The issue is that once I do innertag0 and then innertag1, I can't do innertag0 again. The same applies to the innertag1, innertag0, innertag1 sequence. – xmlschemaquestion Mar 02 '20 at 23:12
  • Please update your original post with more examples of valid and invalid XML instances. – Yitzhak Khabinsky Mar 02 '20 at 23:27
0

Your schema processor doesn't seem to be doing very careful checking against the spec.

If I try to process your schema as an XSD 1.0 schema with Saxon, it tells me there are four errors:

Error at xs:element on line 3 column 59 of test.xsd:
  Attribute @minOccurs is not allowed on element <xs:element>
Error at xs:element on line 3 column 59 of test.xsd:
  Attribute @maxOccurs is not allowed on element <xs:element>
Error at xs:all on line 5 column 15 of test.xsd:
  Within <xs:all>, an <xs:element> must have @maxOccurs equal to 0 or 1
Error at xs:all on line 5 column 15 of test.xsd:
  Within <xs:all>, an <xs:element> must have @maxOccurs equal to 0 or 1
Schema processing failed: 4 errors were found while processing the schema

The first two say that minOccurs and maxOccurs are not allowed on a global element declaration.

The second two say that maxOccurs must be 1 within xs:all - XSD 1.0 doesn't allow an element to repeat when the content model is xs:all. Your processor told you it was an error in the XML instance, but it's actually an error in your schema.

XSD 1.1 does allow multiple occurrences within xs:all. If I correct the global element declaration by deleting @minOccurs and @maxOccurs, the schema is now valid under XSD 1.1, and allows the interleaved instance examples that you were having trouble with.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • I can have multiple innertag0s or innertag1s in a row. So, that implies my schema is using the XSD1.1 and not XSD1.0 standard. However, I eliminated all instances of minOccurs and maxOccurs, and it still fails. I'm not entirely sure where this leaves us. – xmlschemaquestion Mar 03 '20 at 19:17
  • If you get rid of the min/max attributes on the global element declaration, but leave those on the local element declaration, then you have a valid XSD 1.1 schema and your example instances are schema-valid. – Michael Kay Mar 03 '20 at 22:02