how does an XML parser know where to find a schema file?

Question

I'm working with XML for the first time for a work project. I feel like I've got the basics down but one thing still has me scratching my head. If you're using an schema to designate a namespace, how does an XML parser know where to find the schema file so it can validate what's being fed into it? I get that on one level the only thing that matters is that elements with globally non-unique names be associated with a namespace in which they are unique, but doesn't the parser have to know whether or not the element tag is actually a namespace member? How exactly does that happen given that the naming convention for namespaces is typically a URL that (probably) doesn't have anything to do with the schema in question other than as a unique string of characters? In other words, how does a parser that needs to validate an XML file find the schema(s) associated with that file?

score 2 · Accepted Answer · answered Apr 25 '14 at 08:28

There are many possible mechanisms and it depends which schema processor you are using. Schema processing is sometimes integrated with XML parsing but conceptually it's a separate operation and can be done independently.

One way which many people use, but which I don't like much, is the xsi:schemaLocation attribute where the XML instance document itself defines a mapping from namespace URIs to schema locations. I don't like it because if you're validating a document you shouldn't trust it enough to tell you what schema to use for validation.

Most schema processors are likely to have some kind of API or command line interface that allows you to provide schema locations. For example if you're using Saxon then it's

...Validate -s:source.xml -xsd:schema.xsd

where schema.xsd is the top-level schema document that includes/imports any other schema documents needed. There's no explicit binding to namespaces here: Saxon will read the schema documents provided and work out which definitions apply to which namespaces.

neato-burrito! okay, so as I'm understanding you, it's possible for the XML file itself to point to a schema file, but this is undesirable. the preferred method is to point the parser to the file location(s) so that when it reads the namespace tags in the XML file, it can match those with the 'targetNamespace' tags in the schema file and validate. thank you for clearing that up for me! the tutorial docs weren't all that helpful in elucidating how this happened :-p — snerd, Apr 25 '14 at 16:08

how does an XML parser know where to find a schema file?

1 Answers1