I have an XML file (a sitemap using Google's <image:image>
extensions) that I need to validate against the two local XSD files, but validation fails because <url>
doesn't allow <image:image>
as a child. The full error message is
org.xml.sax.SAXParseException:
cvc-complex-type.2.4.a: Invalid content was found starting with element 'image:image'.
One of '{"http://www.sitemaps.org/schemas/sitemap/0.9":lastmod,
"http://www.sitemaps.org/schemas/sitemap/0.9":changefreq,
"http://www.sitemaps.org/schemas/sitemap/0.9":priority}'
is expected.
Here's the sitemap XML I'm trying to validate:
<?xml version="1.0"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>http://example.com/index.html</loc>
<image:image>
<image:loc>http://example.com/images/mysite.jpg</image:loc>
<image:title>My Site's Logo</image:title>
<image:caption>Logo for My Site by Andy Warhol (not really)</image:caption>
</image:image>
</url>
...
</urlset>
I'm using the standard XSDs for sitemaps and Google Images, but since neither references the other I don't see how to make <image:image>
a valid child of <url>
.
If it helps, here is the code that performs the validation.
Source document = ...
StreamSource[] source = new StreamSource[] {
new StreamSource(this.getClass().getResourceAsStream("sitemap.xsd"), "http://www.sitemaps.org/schemas/sitemap/0.9"),
new StreamSource(this.getClass().getResourceAsStream("sitemap-image.xsd"), "http://www.google.com/schemas/sitemap-image/1.1")
};
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI).newSchema(source)
.newValidator().validate(document);
The closest SO question I could find requires pre-parsing and splitting up the XML file because the schema to apply varies based on data values. My requirement is much simpler and I would hope much easier to solve.
Update: I had the old schema that didn't allow any other children for the element. sitemaps.org has updated their XSD to add
<xsd:any namespace="##other" minOccurs="0" maxOccurs="unbounded" processContents="strict"/>