1

I have an XML file (a sitemap using Google's <image:image> extensions) that I need to validate against the two local XSD files, but validation fails because <url> doesn't allow <image:image> as a child. The full error message is

org.xml.sax.SAXParseException: 
cvc-complex-type.2.4.a: Invalid content was found starting with element 'image:image'.
One of '{"http://www.sitemaps.org/schemas/sitemap/0.9":lastmod, 
         "http://www.sitemaps.org/schemas/sitemap/0.9":changefreq, 
         "http://www.sitemaps.org/schemas/sitemap/0.9":priority}' 
is expected.

Here's the sitemap XML I'm trying to validate:

<?xml version="1.0"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
  <url>
    <loc>http://example.com/index.html</loc>
    <image:image>
      <image:loc>http://example.com/images/mysite.jpg</image:loc>
      <image:title>My Site's Logo</image:title>
      <image:caption>Logo for My Site by Andy Warhol (not really)</image:caption>
    </image:image>
  </url>
  ...
</urlset>

I'm using the standard XSDs for sitemaps and Google Images, but since neither references the other I don't see how to make <image:image> a valid child of <url>.

If it helps, here is the code that performs the validation.

Source document = ...
StreamSource[] source = new StreamSource[] {
        new StreamSource(this.getClass().getResourceAsStream("sitemap.xsd"), "http://www.sitemaps.org/schemas/sitemap/0.9"),
        new StreamSource(this.getClass().getResourceAsStream("sitemap-image.xsd"), "http://www.google.com/schemas/sitemap-image/1.1")
    };
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI).newSchema(source)
             .newValidator().validate(document);

The closest SO question I could find requires pre-parsing and splitting up the XML file because the schema to apply varies based on data values. My requirement is much simpler and I would hope much easier to solve.

Update: I had the old schema that didn't allow any other children for the element. sitemaps.org has updated their XSD to add

<xsd:any namespace="##other" minOccurs="0" maxOccurs="unbounded" processContents="strict"/>
Community
  • 1
  • 1
David Harkness
  • 35,992
  • 10
  • 112
  • 134

2 Answers2

2

Took me a while to figure out the syntax to do schema validation (Google's own samples don't actually validate against the XSD files):

<urlset  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation=
        "http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
        http://www.google.com/schemas/sitemap-image/1.1 http://www.google.com/schemas/sitemap-image/1.1/sitemap-image.xsd"
         xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
         xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
Peter O.
  • 32,158
  • 14
  • 82
  • 96
1

Actually, the sitemap schema allows any element at that location as long as it is from another name space and provided there is a schema around (since the "processContent" is strict. However, your <image> data is not valid, <caption> must appear before <title>.

When I test it on Java 1.6, it validates OK.

forty-two
  • 12,204
  • 2
  • 26
  • 36
  • Wow, sitemaps.org must have changed the schema without changing the revision number--probably because the sitemap structure didn't change. I checked the file I linked against what I've had in my application for over a year and of course they differ. The old one doesn't allow any other elements. "Upgrading" to the latest schema fixed the problem. Thanks! – David Harkness Apr 19 '11 at 17:15
  • Your'e welcome. The change is at least backwards compatible. Anyhow, schemas are evil--or, rather, (most of) the use of schemas are;-) – forty-two Apr 19 '11 at 20:33