4

As per Validating a HUGE XML file Agreed but I am still confused...how is XML Schema validation even possible with SAX parsing.I mean schema validation involves going back and forth in the XML to validate for example - key references etc. Shouldn't the whole XML be available in memory to do that? Sorry for the dumb question :(

Community
  • 1
  • 1
Vishal
  • 1,169
  • 2
  • 12
  • 20

2 Answers2

4

Validation against a schema can be done with almost zero memory. The UPA constraint ensures that validation against a content model never requires backtracking. You do need of course to keep track of your state in the FSM of the content model for every element on the stack, that is, memory proportional to the maximum nesting depth of the document.

ID/IDREF validation is an exception: for this, the processor needs memory proportional to the number of ID and IDREF values encountered. Crudely, the processor remembers all the IDs and IDREF values found, and when it gets to the end of the document, checks that no ID appears twice and that every IDREF appears among the IDs. Similarly, for checking of unique/key/keyref the processor needs to remember what key values have been found. But the memory needed for this is a lot less than "keeping the whole XML in memory".

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Thanks for that great answer. What is UPA? FSM is finite state machine I believe. – Vishal Sep 05 '12 at 02:31
  • UPA = unique particle attribution, a constraint in XSD that ensures content models are unambiguous. FSM = finite state machine. – Michael Kay Sep 05 '12 at 07:04
0

Most parsers must build a Schema/DTD tree in memory before starting any validation, after that its mostly sequential lookups and sometimes a little push, peek and pop.

Karl-Bjørnar Øie
  • 5,554
  • 1
  • 24
  • 30
  • so the schema is loaded as DOM but the actual XML is not and the parses keeps going back and forth loading as much it wants, correct? – Vishal Sep 04 '12 at 21:49