0

I'm currently facing a strange JAXB namespace behavior when first unmarshalling and then marshalling an object, when this object has an @XmlAnyElement property.

Here the setup:

package-info.java

@XmlSchema(
    namespace = "http://www.example.org",
    elementFormDefault = XmlNsForm.QUALIFIED,
    xmlns = { @javax.xml.bind.annotation.XmlNs(prefix = "example", namespaceURI = "http://www.example.org") }
)

Type definition:

@XmlRootElement
@XmlType(namespace="http://www.example.org")
public class Message {

    private String id;

    @XmlAnyElement(lax = true)
    private List<Object> any;

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public List<Object> getAny() {
        if (any == null) {
            any = new ArrayList<>();
        }
        return this.any;
    }
}

and the test code itself:

@Test
public void simpleTest() throws JAXBException {

    JAXBContext jaxbContext = JAXBContext.newInstance(Message.class);
    Marshaller marshaller = jaxbContext.createMarshaller();
    marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-8");
    marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
    marshaller.setProperty(Marshaller.JAXB_FRAGMENT, true);
    Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();

    String xml =
            "<example:message xmlns:example=\"http://www.example.org\" xmlns:test=\"http://www.test.org\" xmlns:unused=\"http://www.unused.org\">\n" +
            "   <example:id>id-1</example:id>\n" +
            "   <test:value>my-value</test:value>\n" +
            "   <test:value>my-value2</test:value>\n" +
            "</example:message>";
    System.out.println("Source:\n"+xml);

    // parsed
    Object unmarshalled = unmarshaller.unmarshal(new StringReader(xml));

    // directly convert it back
    StringWriter writer = new StringWriter();
    marshaller.marshal(unmarshalled, writer);
    System.out.println("\n\nMarshalled again:\n"+writer.toString());
}

The problem with this setup is that all 'unknown' namespaces are repeatedly added to the any elements.

<example:message xmlns:example="http://www.example.org" xmlns:test="http://www.test.org" xmlns:unused="http://www.unused.org">
   <example:id>id-1</example:id>
   <test:value>my-value</test:value>
   <test:value>my-value2</test:value>
</example:message>

becomes this:

<example:message xmlns:example="http://www.example.org">
    <test:value xmlns:test="http://www.test.org" xmlns:unused="http://www.unused.org">my-value</test:value>
    <test:value xmlns:test="http://www.test.org" xmlns:unused="http://www.unused.org">my-value2</test:value>
    <example:id>id-1</example:id>
</example:message>

Thus, how can I avoid this! Why isnt't the namespace defined once in the root element just like on the input xml? Since the namespace of the anyElement is not known upfront, it's not possible to register it via a package definition...

In addition, would it also be possible that unused namespaces are stripped out (on demand)?

Leikingo
  • 890
  • 3
  • 10
  • 23

1 Answers1

2

When JAXB starts marshalling your objects to XML, it is going to have some context depending on where in the object hierarchy and output XML it is. It's a streaming operation by definition, so it's only going to look at what's going on at the moment and its current context.

So say it's starting to marshal your Message instance. It will check what the local element name should be (message), the namespace it must be in (http://www.example.org) and if there's a specific prefix bound to that namespace (in your case, yes, the example prefix). As long as you're in your Message instance, that's now part of the context. If it encounters further objects in the hierarchy that are within the same namespace, it will already have it in its context and reuses the same prefix, because it knows some parent or ancestor element has it declared. It also checks if there's any attributes to be marshalled, so it can complete the opening tag. The XML output so far looks like this:

<example:message xmlns:example="http://www.example.org">

Now it starts digging into the fields that have to be marshalled but that aren't attributes. It finds your List<Object> any field and gets to work. The first entry is some object that would get marshalled to a value element in namespace http://www.test.org. That namespace isn't bound to any prefix yet in the current context, so it gets added, and the preferred prefix is found through the package-info annotations (or some other supported method). There's nothing further nested in the value that needs to be marshalled, so it can finish that part and the output now looks like this:

<example:message xmlns:example="http://www.example.org">
    <test:value xmlns:test="http://www.test.org" xmlns:unused="http://www.unused.org">my-value</test:value>

Here the marshalling of the first list entry ends, the value element gets its closing tag, and its context expires. Onto the next list entry. It is again an instance of an object that gets marshalled to value, again in the same namespace, but it no longer has that in its current context. So the same thing happens.

<example:message xmlns:example="http://www.example.org">
    <test:value xmlns:test="http://www.test.org" xmlns:unused="http://www.unused.org">my-value</test:value>
    <test:value xmlns:test="http://www.test.org" xmlns:unused="http://www.unused.org">my-value2</test:value>

Now it gets around to the String id field, which falls within the same namespace as Message. That one is still known in the current context, because we're still in the message. So that namespace isn't declared again.

<example:message xmlns:example="http://www.example.org">
    <test:value xmlns:test="http://www.test.org" xmlns:unused="http://www.unused.org">my-value</test:value>
    <test:value xmlns:test="http://www.test.org" xmlns:unused="http://www.unused.org">my-value2</test:value>
    <example:id>id-1</example:id>
</example:message>

So why doesn't JAXB just maintain a list of namespaces and their prefix bindings and put those at the root element? Because it's streaming output. It can't just jump back. It could, if it was building a DOM in-memory, but that wouldn't be very efficient.

Conversely, why doesn't it just traverse its object tree first and create a list of namespace bindings to use? Again, because that wouldn't be very efficient. Also, it may simply not be entirely known up-front how the context is going to change during processing. Maybe we'll end up in some package with a different namespace but the same prefix as some other namespace. If in the XML we haven't currently bound anything to that prefix, that's fine. Like here (notice the second test namespace):

<example:message xmlns:example="http://www.example.org">
    <test:value xmlns:test="http://www.test.org">my-value</test:value>
    <test:value xmlns:test="http://completelydifferenttest">my-value2</test:value>
    <example:id>id-1</example:id>
</example:message>

But in other situations it would have to choose some different prefix. Like this semantically equivalent document:

<example:message xmlns:example="http://www.example.org" xmlns:test="http://www.test.org">
    <test:value>my-value</test:value>
    <ns1:value xmlns:ns1="http://completelydifferenttest">my-value2</ns1:value>
    <example:id>id-1</example:id>
</example:message>

So JAXB just looks at things within the currently encompassing context, and locally. It doesn't go digging up-front.

That doesn't really solve things yet, however. So here's what you can do.

  • Ignore it. The output, verbose and ugly though it may be, is correct.
  • Apply an XSLT transformation after marshalling to clean up namespaces.
  • Use a custom NamespacePrefixMapper.
  • Marshal to an XMLEventWriter and have it delegate customized events to a standard writer.

The custom mapper is a solution that relies on the JAXB reference implementation, and uses internal classes. So its forward-compatibility can't really be guaranteed. Blaise Doughan explains its use in this answer: https://stackoverflow.com/a/28540700/630136

The last option is a bit more involved. You could write some event writer that outputs all namespaces with default prefix bindings on the root element, and ignores them on subsequent elements when it's a known namespace. You'd effectively be keeping some global context from the start.

The XSLT might be the easiest, though it may require some experimenting to see how the XSLT processor handles it. This one actually did the trick for me:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
    xmlns:example="http://www.example.org" xmlns:test="http://www.test.org" 
    xmlns:unused="http://www.unused.org">
    <xsl:output method="xml" indent="yes" />

    <xsl:template match="node()|@*">
      <xsl:copy>
          <xsl:apply-templates select="node()|@*" />
      </xsl:copy>
    </xsl:template>

    <xsl:template match="/example:message">
        <example:message>
            <xsl:apply-templates select="node()|@*" />
        </example:message>
    </xsl:template>

</xsl:transform>

Note that if I turn the second template in a match on /* and use the <xsl:copy> approach there, it somehow doesn't work.

To marshal from an object and transform the resulting XML in one smooth step, look into the use of the JAXBSource class. It lets you use a JAXB object as the source for an XML transformation.

EDIT: regarding the "unused" namespace. I remember getting a bunch of namespaces that weren't even needed in some JAXB output at some point, and in that case it turned out to be related to @XmlSeeAlso annotations that had been placed on some classes by the XML-to-Java compiler I was using (the starting point was an XML schema). The annotation makes sure that if a class is loaded into a JAXBContext, the classes referenced in @XmlSeeAlso are included. This can make the creation of contexts a lot easier. But a side-effect was that it included a bunch of things that I didn't always need and didn't necessarily always want in the context. I think JAXB will create namespace-prefix mappings for everything it can find at that point.

Speaking of which, this could in fact offer another solution to your problem. If you put an @XmlSeeAlso annotation on your root class, and refer to other classes (or at least the root of sub-hierarchies) that could potentially be used, maybe JAXB will already bind all the namespaces for the encountered packages right at the root. I'm not always a fan of the annotation because I don't think superclasses should refer to implementations, and classes higher in a hierarchy shouldn't have to worry about details of those lower in it. But if it doesn't conflict with your architecture it's worth a shot.

Community
  • 1
  • 1
G_H
  • 11,739
  • 3
  • 38
  • 82
  • Wow. Great answer. Thanks! The XSLT approach looks tempting but I think together with the unknown namespaces upfront this could get very ugly as well... The custom mapper would be nice, but unfortunately I'm not using the reference implementation (just pain JDK no deps). Thus I think I have to live with the ugliness. *One* last question though: Why is the 'unused' NS added in step 2? It's not needed...Does the marshaller just add all _remaining_ namespaces since it does not know whats coming? – Leikingo Feb 17 '17 at 16:16
  • @Ingo If you're using a standard Java distribution, you're probably using the reference implementation already. Of course, in order to actually be able to import the classes in the internal namespaces you'd need some dependency, but it could be considered "provided". As for the "unused" namespace, I knew I forgot something! Edit coming up. – G_H Feb 17 '17 at 16:20
  • Again...thanks for the quick response and the update.As you can see, I don't have any `@XmlSeeAlso` refs, thus, I don't know why this happens. The any element is really an extension, thus others can add content on their own. No schema, no generation, nothing. Just an extension point. Okay...then _they_ should really only sent namespace declarations which they are really using in the xml. No cleanup from our side. ;-) – Leikingo Feb 17 '17 at 16:36
  • Regarding the reference implementation...You are right here as well. But somehow I'm not able to execute `marshaller.setProperty("com.sun.xml.bind.namespacePrefixMapper", new MyNamespacePrefixMapper());`. I get a `javax.xml.bind.PropertyException`. But I think this problem would be better placed in a separate thread. – Leikingo Feb 17 '17 at 16:39