4

I have the following code:

public XsdValidator(Resource... xsds) {
    Preconditions.checkArgument(xsds != null);
    try {
      this.xsds = ImmutableList.copyOf(xsds);
      SchemaFactory schemaFactory = SchemaFactory.newInstance(W3C_XML_SCHEMA_NS_URI);
      LOGGER.debug("Schema factory created: {}",schemaFactory);
      StreamSource[] streamSources = streamSourcesOf(xsds);
      LOGGER.debug("StreamSource[] created: {}",streamSources);
      Schema schema = schemaFactory.newSchema(streamSources);
      LOGGER.debug("Schema created: {}",schema);
      validator = schema.newValidator();
      LOGGER.debug("Validator created: {}",validator);
    } catch ( Exception e ) {
      throw new IllegalArgumentException("Can't build XsdValidator",e);
    }
  }

It seems the line schemaFactory.newSchema(streamSources); takes a very long time (30 seconds) to execute against my XSD file.

After many tests on this XSD, it seems it's because I have:

  <xs:complexType name="entriesType">
    <xs:sequence>
      <xs:element type="prov:entryType" name="entry" minOccurs="0" maxOccurs="10000" />
    </xs:sequence>
  </xs:complexType>

The problem is maxOccurs="10000"

With maxOccurs="1" or maxOccurs="unbounded", it is very fast.

Can someone tell me what's the problem of using maxOccurs="10000" ?

Sebastien Lorber
  • 89,644
  • 67
  • 288
  • 419

1 Answers1

4

Based on my personal experience, having particles bounded by what some may consider "unreasonably" high values is cause for performance problems (this link is from my browser's favourites).

The underlying cause seems to be memory allocation (to the effect indicated by the maxOccurs value).

Also, I recall a documentation item which was stating a threshold value beyond which, for all intents and purposes, the parser would actually treat the maxOccurs as unbounded, regardless of what the XSD says (I'll revisit this post if I find it).

Petru Gardea
  • 21,373
  • 2
  • 50
  • 62
  • The link you provided is only a RAD (Eclipse) performance issue and not an application performance problem... right? @Petru Gardea – javaPlease42 Dec 03 '13 at 20:51
  • @javaPlease42, not really, if you go through it, you may notice that it is a Xerces problem; some people reported that it is still manifesting in 2.11 (I cannot confirm that). – Petru Gardea Dec 03 '13 at 22:51