27

I have an XML org.w3c.dom.Node that looks like this:

<variable name="variableName">
    <br /><strong>foo</strong> bar
</variable>

How do I get the <br /><strong>foo</strong> bar part as a String?

Marjan
  • 1,378
  • 1
  • 14
  • 21
  • 1
    Note to some of the answers below: Do not use text parsing based solutions, ever. Consider output like this: ` <[CDATA[ <.../> ]]>` – Ondra Žižka Jul 29 '18 at 23:15

10 Answers10

47

Same problem. To solve it I wrote this helper function:

public String innerXml(Node node) {
    DOMImplementationLS lsImpl = (DOMImplementationLS)node.getOwnerDocument().getImplementation().getFeature("LS", "3.0");
    LSSerializer lsSerializer = lsImpl.createLSSerializer();
    NodeList childNodes = node.getChildNodes();
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < childNodes.getLength(); i++) {
       sb.append(lsSerializer.writeToString(childNodes.item(i)));
    }
    return sb.toString(); 
}
Andrey M.
  • 3,688
  • 3
  • 33
  • 36
  • 1
    This method keeps adding the XML definition tag at the front of the string... is there any way to prevent that, besides simply trimming it off afterwards? – Nyerguds Aug 08 '11 at 09:58
  • 26
    I solved it. The solution to this is to add the line `lsSerializer.getDomConfig().setParameter("xml-declaration", false);` – Nyerguds Aug 08 '11 at 10:27
  • Is it easier to just use XSL?: – Bryn Lewis Aug 14 '20 at 05:32
6

There is no simple method on org.w3c.dom.Node for this. getTextContent() gives the text of each child node concatenated together. getNodeValue() will give you the text of the current node if it is an Attribute,CDATA or Text node. So you would need to serialize the node using a combination of getChildNodes(), getNodeName() and getNodeValue() to build the string.

You can also do it with one of the various XML serialization libraries that exist. There is XStream or even JAXB. This is discussed here: XML serialization in Java?

Ondra Žižka
  • 43,948
  • 41
  • 217
  • 277
Robert Diana
  • 860
  • 7
  • 18
5

If you're using jOOX, you can wrap your node in a jquery-like syntax and just call toString() on it:

$(node).toString();

It uses an identity-transformer internally, like this:

ByteArrayOutputStream out = new ByteArrayOutputStream();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
Source source = new DOMSource(element);
Result target = new StreamResult(out);
transformer.transform(source, target);
return out.toString();
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
3

Extending on Andrey M's answer, I had to slightly modify the code to get the complete DOM document. If you just use the

 NodeList childNodes = node.getChildNodes();

It didn't include the root element for me. To include the root element (and get the complete .xml document) I used:

 public String innerXml(Node node) {
     DOMImplementationLS lsImpl = (DOMImplementationLS)node.getOwnerDocument().getImplementation().getFeature("LS", "3.0");
     LSSerializer lsSerializer = lsImpl.createLSSerializer();
     lsSerializer.getDomConfig().setParameter("xml-declaration", false);
     StringBuilder sb = new StringBuilder();
     sb.append(lsSerializer.writeToString(node));
     return sb.toString(); 
 }
Alan
  • 303
  • 3
  • 10
2

If you dont want to resort to external libraries, the following solution might come in handy. If you have a node <parent><child name="Nina"/></parent> and you want to extract the children of the parent element proceed as follows:

    StringBuilder resultBuilder = new StringBuilder();
    // Get all children of the given parent node
    NodeList children = parent.getChildNodes();
    try {

        // Set up the output transformer
        TransformerFactory transfac = TransformerFactory.newInstance();
        Transformer trans = transfac.newTransformer();
        trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        trans.setOutputProperty(OutputKeys.INDENT, "yes");
        StringWriter stringWriter = new StringWriter();
        StreamResult streamResult = new StreamResult(stringWriter);

        for (int index = 0; index < children.getLength(); index++) {
            Node child = children.item(index);

            // Print the DOM node
            DOMSource source = new DOMSource(child);
            trans.transform(source, streamResult);
            // Append child to end result
            resultBuilder.append(stringWriter.toString());
        }
    } catch (TransformerException e) {
        //Error handling goes here
    }
    return resultBuilder.toString();
Ondra Žižka
  • 43,948
  • 41
  • 217
  • 277
AgentKnopf
  • 4,295
  • 7
  • 45
  • 81
1

I had the problem with the last answer that method 'nodeToStream()' is undefined; therefore, my version here:

    public static String toString(Node node){
    String xmlString = "";
    try {
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        //transformer.setOutputProperty(OutputKeys.INDENT, "yes");

        Source source = new DOMSource(node);

        StringWriter sw = new StringWriter();
        StreamResult result = new StreamResult(sw);

        transformer.transform(source, result);
        xmlString = sw.toString ();

    } catch (Exception ex) {
        ex.printStackTrace ();
    }

    return xmlString;
}
MatEngel
  • 13
  • 4
1

I want to extend the very good answer from Andrey M.:

It can happen that a node is not serializeable and this results in the following exception on some implementations:

org.w3c.dom.ls.LSException: unable-to-serialize-node: 
            unable-to-serialize-node: The node could not be serialized.

I had this issue with the implementation "org.apache.xml.serialize.DOMSerializerImpl.writeToString(DOMSerializerImpl)" running on Wildfly 13.

To solve this issue I would suggest to change the code example from Andrey M. a little bit:

private static String innerXml(Node node) {
    DOMImplementationLS lsImpl = (DOMImplementationLS) node.getOwnerDocument().getImplementation().getFeature("LS", "3.0");
    LSSerializer lsSerializer = lsImpl.createLSSerializer();
    lsSerializer.getDomConfig().setParameter("xml-declaration", false); 
    NodeList childNodes = node.getChildNodes();
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < childNodes.getLength(); i++) {
        Node innerNode = childNodes.item(i);
        if (innerNode!=null) {
            if (innerNode.hasChildNodes()) {
                sb.append(lsSerializer.writeToString(innerNode));
            } else {
                sb.append(innerNode.getNodeValue());
            }
        }
    }
    return sb.toString();
}

I also added the comment from Nyerguds. This works for me in wildfly 13.

Ralph
  • 4,500
  • 9
  • 48
  • 87
0

The best solution so far, Andrey M's, needs a specific implementation which can cause issues in the future. Here is the same approach but with just whatever JDK gives you to do the serialization (that means, what is configured to be used).

public static String innerXml(Node node) throws Exception
{
        StringWriter writer = new StringWriter();
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");

        NodeList childNodes = node.getFirstChild().getChildNodes();
        for (int i = 0; i < childNodes.getLength(); i++) {
            transformer.transform(new DOMSource(childNodes.item(i)), new StreamResult(writer));
        }
        return writer.toString();
}

If you're processing a document rather than a node, you must go one level deep and use node.getFirstChild().getChildNodes(); But, to make it more robust, you should find the first Element, not just take it for granted that there is only one node. XML has to have a single root element, but can multiple nodes, including comments, entities and whitespace text.

        Node rootElement = docRootNode.getFirstChild();
        while (rootElement != null && rootElement.getNodeType() != Node.ELEMENT_NODE)
            rootElement = rootElement.getNextSibling();
        if (rootElement == null)
            throw new RuntimeException("No root element found in given document node.");

        NodeList childNodes = rootElement.getChildNodes();

And if I should recommend a library to deal with it, try JSoup, which is primarily for HTML, but works with XML too. I haven't tested that though.

Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
fileContents.put(Attributes.BODY, document.body().html());
// versus: document.body().outerHtml()
Ondra Žižka
  • 43,948
  • 41
  • 217
  • 277
-1

Building on top of Lukas Eder's solution, we can extract innerXml like in .NET as below

    public static String innerXml(Node node,String tag){
            String xmlstring = toString(node);
            xmlstring = xmlstring.replaceFirst("<[/]?"+tag+">","");
            return xmlstring;       
}

public static String toString(Node node){       
    String xmlString = "";
    Transformer transformer;
    try {
        transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        //transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        StreamResult result = new StreamResult(new StringWriter());

        xmlString = nodeToStream(node, transformer, result);

    } catch (TransformerConfigurationException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (TransformerFactoryConfigurationError e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (TransformerException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }catch (Exception ex){
        ex.printStackTrace();
    }

    return xmlString;               
}

Ex:

If Node name points to xml with string representation "<Name><em>Chris</em>tian<em>Bale</em></Name>" 
String innerXml = innerXml(name,"Name"); //returns "<em>Chris</em>tian<em>Bale</em>"
Jeevan
  • 8,532
  • 14
  • 49
  • 67
-1

Here is an alternative solution to extract the content of a org.w3c.dom.Node. This solution works also if the node content contains no xml tags:

private static String innerXml(Node node) throws TransformerFactoryConfigurationError, TransformerException {
    StringWriter writer = new StringWriter();
    String xml = null;
    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    transformer.transform(new DOMSource(node), new StreamResult(writer));
    // now remove the outer tag....
    xml = writer.toString();
    xml = xml.substring(xml.indexOf(">") + 1, xml.lastIndexOf("</"));
    return xml;
}
Ralph
  • 4,500
  • 9
  • 48
  • 87