1

I would like my XSD to be able to enforce that at least one of a set of elements must be present. Some of those elements can appear unlimited times, some are limited.

None of the elements are compulsory by themselves but at least one must be present.

Finally the elements can appear in any order. i.e. the order should neither be enforced nor have any impact on the processing of the document

My attempt at this:

<xs:element name="Root">
  <xs:complexType>
    <xs:choice>
      <xs:element minOccurs="0" maxOccurs="unbounded" name="A" type="a:a"/>
      <xs:element minOccurs="0" maxOccurs="unbounded" name="B" type="a:a"/>
      <xs:element minOccurs="0" maxOccurs="1" name="C" type="a:a"/>
      <xs:element minOccurs="0" maxOccurs="1" name="D" type="a:a"/>
    </xs:choice>
  </xs:complexType>
</xs:element>

This doesn't quite work as it doesn't enforce there being at least one of these elements.

So in the example above there must be at least one of A, B, C or D. There can be multiple A or B elements but only 1 each of C or D

So the following should all be valid:

  • <root><A/></root>
  • <root><A/><A/></root>
  • <root><A/><B/><A/></root>
  • <root><C/></root>
  • <root><D/><C/></root>

But the following should not be valid:

  • <root/> - at least one element must be present
  • <root></root> - at least one element must be present
  • <root><C/><C/></root> - duplicate <C/> elements

There is the added complication that this is an extension on an existing complexType but I don't think that should affect anything?

Is this possible?

I tried adding minOccurs="1" to the xs:choice element. This had no effect, presumably because each instance of the choice can be empty!

I also tried having both an xs:choice (with no minOccurs in the children) and an xs:sequence (without C & D nodes) But I couldn't find any parent element which would allow both to co-exist.

DJL
  • 2,060
  • 3
  • 20
  • 39

1 Answers1

1

In addition to allowing the first two of your should-be-invalid examples, the content model you show also fails by not allowing the third or fifth of your should-be-ok examples.

By far the simplest way to capture the constraints you describe in XSD, Relax NG, or DTDs is to add another and constrain the order. The content model is then (in DTD notation) ((A+, B*, C?, D) | (B+, C?, D?) | (C, D?) | (D)). If the sequence of A, B, C, D children conveys no information, this is by far the simplest approach.

If the sequence of children does convey information, however, and needs to be free, you have a most complicated task. Not particularly difficult, but complicated and sometimes (especially for more complicated examples) rather tedious.

The set of legal sequences of A, B, C, and D children you describe is a regular language; it may help to think about the finite state automaton you would use to recognize it. Essentially, you need to keep track of three bits of information: (a) have we seen any child element at all? (b) have we seen a C element? (c) have we seen a D element? If these were completely independent, we'd need eight states in the FSA; since they aren't, we need only five:

  • 0 start state (no, no, no)
  • 1 have seen one or more A or B elements but not C or D (yes, no, no)
  • 2 have seen a C element, no D (X, yes, no)
  • 3 have seen a D element, no C
  • 4 have seen both C and D

The transition table is

  • 0 A, B -> 1; C -> 2; D -> 3
  • 1 A, B -> 1; C -> 2; D -> 3
  • 2 A, B -> 2; C -> error; D -> 4
  • 3 A, B -> 3; C -> 4; D -> error
  • 4 A, B -> 4; C, D -> error

Accept states are 1-4.

Note that once we have seen at least one element, neither A nor B ever has any effect on the state of the automaton. This suggests a good way to start writing the content model: ignore A and B for a moment and write a content model with only C and D. Since one of these must occur, but neither can repeat, we can describe the legal sequences of C and D elements this way (again, in DTD notation for compactness):

((C, D?) | (D, C?))

Allowing A and B to occur arbitrarily many times (but not at the beginning of the sequence) gives us

( (C, (A|B)*, (D, (A|B)*)?)
| (D, (A|B)*, (C, (A|B)*)?) )

If we begin with either A or B, the initial sequence of As and Bs can be followed by any sequence matching the sequence above, or by nothing. For that case, the content model would be

((A|B)+, ( (C, (A|B)*, (D, (A|B)*)?)
         | (D, (A|B)*, (C, (A|B)*)?) )? )

Putting these together, the content model as a whole becomes

( ( (A|B)+, ( (C, (A|B)*, (D, (A|B)*)?)
            | (D, (A|B)*, (C, (A|B)*)?) )? )
| (C, (A|B)*, (D, (A|B)*)?)
| (D, (A|B)*, (C, (A|B)*)?) )

It's trivial to translate this into XSD notation.

In some schema languages, there are more convenient (by which mostly people mean: more compact) ways to express this or related content models.

In Schematron or XSD 1.1, you can write assertions (using XPath) to say

  • Every child is named A, B, C, or D.
  • There is at least one child.
  • There is at most one child named C.
  • There is at most one child named D.

In XSD 1.1, you can use an all-group whose children are A*, B*, C?, and D?, together with an assertion that says there is at least one child.

In Relax NG, you can write a content model which allows the empty sequence but is otherwise as described using the interleave operator, thus:

(A* & B* & C? & D?)

You can enforce non-emptiness with a slightly more complex model:

( ((A|B), (A* & B* & C? & D?))
| (C, (A* & B* & D?))
| (D, (A* & B* & C?)) )
C. M. Sperberg-McQueen
  • 24,596
  • 5
  • 38
  • 65
  • Wow. Thanks for your very detailed answer. Unfortunately I am not familiar with DTD syntax however if I assume it to be somewhat similar to a regex then I can kind of understand what you have here. Just to clarify something - in my instance the order of elements is totally irrelevant. There should be no restriction on the order nor does it affect processing of the document in any way. The parent element is essentially a mixed bag of stuff – DJL Mar 06 '14 at 09:04
  • I did consider the idea of duplicating the group with all possible orders. However, as hard as this is for 4 element types - in my real-world example there are actually 5 and this will almost certainly increase. Therefore this is not a practical solution - something more generic is definitely required! – DJL Mar 06 '14 at 09:10
  • 1
    Yes, DTD notation is similar to (a subset of) conventional regular expressions, except that comma rather than juxtaposition is used to describe sequences. – C. M. Sperberg-McQueen Mar 08 '14 at 00:51
  • If the order in which children appear does not affect processing of the document in any way, then allowing it to vary does not achieve anything except making the content model harder to express. If a fixed order does the job, I'd use a fixed order; otherwise, I'd move to XSD 1.1. – C. M. Sperberg-McQueen Mar 08 '14 at 00:55
  • Oh, yes, and for an example of the mapping of DTD to XSD notation, see [this related question](http://stackoverflow.com/questions/2290360/xsd-how-to-allow-elements-in-any-order-any-number-of-times). – C. M. Sperberg-McQueen Mar 08 '14 at 01:07