Java/DOM: Get the XML content of a node

Question

I am parsing a XML file in Java using the W3C DOM. I am stuck at a specific problem, I can't figure out how to get the whole inner XML of a node.

The node looks like that:

<td><b>this</b> is a <b>test</b></td>

What function do I have to use to get that:

"<b>this</b> is a <b>test</b>"

[This post on SO may help to get the inner Xml of a node ][1] [1]: http://stackoverflow.com/questions/7910474/how-to-get-node-contents-from-jdom — Jeevan, Jun 23 '14 at 10:00

score 4 · Answer 1 · answered Dec 28 '10 at 20:41

I know this was asked long ago but for the next person searching (was me today), this works with JDOM:

JDOMXPath xpath = new JDOMXPath("/td");
String innerXml = (new XMLOutputter()).outputString(xpath.selectNodes(document));

This passes a list of all child nodes into outputString, which will serialize them out in order.

score 3 · Accepted Answer · edited May 23 '17 at 12:22

3

You have to use the transform/xslt API using your <b> node as the node to be transformed and put the result into a new StreamResult(new StringWriter()); . See how-to-pretty-print-xml-from-java

edited May 23 '17 at 12:22

Community

1
1

answered Jan 27 '09 at 20:12

Pierre

34,472
31
113
192

score 1 · Answer 3 · answered Mar 13 '12 at 09:48

What do you say about this ? I had same problem today on android, but i managed to make simple "serializator"

private String innerXml(Node node){
        String s = "";
        NodeList childs = node.getChildNodes();
        for( int i = 0;i<childs.getLength();i++ ){
            s+= serializeNode(childs.item(i));
        }
        return s;
    }

    private String serializeNode(Node node){
        String s = "";
        if( node.getNodeName().equals("#text") ) return node.getTextContent();
        s+= "<" + node.getNodeName()+" ";
        NamedNodeMap attributes = node.getAttributes();
        if( attributes!= null ){
            for( int i = 0;i<attributes.getLength();i++ ){
                s+=attributes.item(i).getNodeName()+"=\""+attributes.item(i).getNodeValue()+"\"";
            }
        }
        NodeList childs = node.getChildNodes();
        if( childs == null || childs.getLength() == 0 ){
            s+= "/>";
            return s;
        }
        s+=">";
        for( int i = 0;i<childs.getLength();i++ )
            s+=serializeNode(childs.item(i));
        s+= "</"+node.getNodeName()+">";
        return s;
    }

I say 'thanks a lot'! You saved my time from doing the same thing. Considering performance it is probably the most effective way. — Volodymyr Metlyakov, Aug 08 '18 at 12:59

score 0 · Answer 4 · answered Aug 24 '09 at 16:56

0

To remove unneccesary tags probably such code can be used:

DOMConfiguration config = serializer.getDomConfig(); config.setParameter("canonical-form", true);

But it will not always work, because "canonical-form=true" is optional

answered Aug 24 '09 at 16:56

Oleg Vazhnev

23,239
54
171
305

Jason S · Answer 5 · 2009-01-27T21:28:51.407

er... you could also call toString() and just chop off the beginning and end tags, either manually or using regexps.

edit: toString() doesn't do what I expected. Pulling out the O'Reilly Java & XML book talks about the Load and Save module of Java DOM.

See in particular the LSSerializer which looks very promising. You could either call writeToString(node) and chop off the beginning and end tags, as I suggested, or try to use LSSerializerFilter to not print the top node tags (not sure if that would work; I admit I've never used LSSerializer before.)

Reading the O'Reilly book seems to indicate doing something like this:

 DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
 DOMImplementationLS lsImpl = 
   (DOMImplementationLS)registry.getDOMImplementation("LS");
 LSSerializer serializer = lsImpl.createLSSerializer();
 String nodeString = serializer.writeToString(node);

No? .toString() of my td-Node would just result in "[b: null]" — , Jan 27 '09 at 21:06
Hmm, I guess I got that confused with Javascript + e4x. I meant call the function which just produces the output, then delete the beginning/end tags. — Jason S, Jan 27 '09 at 21:21

score 0 · Answer 6 · answered Jan 27 '09 at 22:13

0

node.getTextContent();

You ought to be using JDom of Dom4J to handle nodes, if for no other reasons, to handle whitespace correctly.

answered Jan 27 '09 at 22:13

Java/DOM: Get the XML content of a node

6 Answers6

Linked