1

I am trying to extract an element (as a String) out of an XML document. I have tried both approaches suggested in this SO answer (a similar method is also suggested here) and they both fail to properly account for namespace prefixes that may be defined in some outer-level document.

Using the following code:

// entry point method; see exampes of values for the String `s` in the question
public static String stripPayload(String s) throws Exception {
    final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    final Document doc = dbf.newDocumentBuilder().parse(new InputSource(new StringReader(s)));

    final XPath xPath = XPathFactory.newInstance().newXPath();
    final String xPathToGetToTheNodeWeWishToExtract = "/*[local-name()='envelope']/*[local-name()='payload']";
    final Node result = (Node) xPath.evaluate(xPathToGetToTheNodeWeWishToExtract, doc, XPathConstants.NODE);
    return nodeToString_A(result); // or: nodeToString_B(result)

}

public static String nodeToString_A(Node node) throws Exception {
    final StringWriter buf = new StringWriter();
    final Transformer xform = TransformerFactory.newInstance().newTransformer();
    xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    xform.setOutputProperty(OutputKeys.STANDALONE, "yes");
    xform.transform(new DOMSource(node), new StreamResult(buf));
    return(buf.toString());
}

public static String nodeToString_B(Node node) throws Exception {
    final Document document = node.getOwnerDocument();
    final DOMImplementationLS domImplLS = (DOMImplementationLS) document.getImplementation();
    final LSSerializer serializer = domImplLS.createLSSerializer();
    final String str = serializer.writeToString(node);
    return str;
}        

If the stripPayload method if passed the following strings:

<envelope><payload><a></a><b></b></payload></envelope>

or

<envelope><p:payload xmlns:p='foo'><a></a><b></b></p:payload></envelope>

… both nodeToString_A and nodeToString_B methods work. However, if I pass the following equally valid XML document where the namespace prefix is defined in an outer element:

<envelope xmlns:p='foo'><p:payload><a></a><b></b></p:payload></envelope>

… then both methods fail as they simply emit:

<p:payload><a/><b/></p:payload>

Thus, they are already producing an invalid document as the namespace prefix definition is left out.

The more complicated example below (which uses namespace prefixes in attributes):

<envelope xmlns:p='foo' xmlns:a='alpha'><p:payload a:attr='dummy'><a></a><b></b></p:payload></envelope>

… actually causes nodeToString_A to fail with an exception whereas at least nodeToString_B produces the invalid:

<p:payload a:attr="dummy"><a/><b/></p:payload>

(where again, the prefixes are not defined).

So my question is:

What is a robust way to extract and stringify an inner XML element in a way that takes care of namespace prefixes that may be defined in some outer element?

Marcus Junius Brutus
  • 26,087
  • 41
  • 189
  • 331

1 Answers1

2

You just need to enable name-space-awareness.

public static String stripPayload(String s) throws Exception {
    final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setNamespaceAware(true);

    ...
}

The output will be ...

<p:payload xmlns:p="foo"><a/><b/></p:payload>
minus
  • 2,646
  • 15
  • 18