7

I am parsing a XML file in Java using the W3C DOM. I am stuck at a specific problem, I can't figure out how to get the whole inner XML of a node.

The node looks like that:

<td><b>this</b> is a <b>test</b></td>

What function do I have to use to get that:

"<b>this</b> is a <b>test</b>"
  • [This post on SO may help to get the inner Xml of a node ][1] [1]: http://stackoverflow.com/questions/7910474/how-to-get-node-contents-from-jdom – Jeevan Jun 23 '14 at 10:00

6 Answers6

4

I know this was asked long ago but for the next person searching (was me today), this works with JDOM:

JDOMXPath xpath = new JDOMXPath("/td");
String innerXml = (new XMLOutputter()).outputString(xpath.selectNodes(document));

This passes a list of all child nodes into outputString, which will serialize them out in order.

Joel P.
  • 869
  • 10
  • 20
3

You have to use the transform/xslt API using your <b> node as the node to be transformed and put the result into a new StreamResult(new StringWriter()); . See how-to-pretty-print-xml-from-java

Community
  • 1
  • 1
Pierre
  • 34,472
  • 31
  • 113
  • 192
1

What do you say about this ? I had same problem today on android, but i managed to make simple "serializator"

private String innerXml(Node node){
        String s = "";
        NodeList childs = node.getChildNodes();
        for( int i = 0;i<childs.getLength();i++ ){
            s+= serializeNode(childs.item(i));
        }
        return s;
    }

    private String serializeNode(Node node){
        String s = "";
        if( node.getNodeName().equals("#text") ) return node.getTextContent();
        s+= "<" + node.getNodeName()+" ";
        NamedNodeMap attributes = node.getAttributes();
        if( attributes!= null ){
            for( int i = 0;i<attributes.getLength();i++ ){
                s+=attributes.item(i).getNodeName()+"=\""+attributes.item(i).getNodeValue()+"\"";
            }
        }
        NodeList childs = node.getChildNodes();
        if( childs == null || childs.getLength() == 0 ){
            s+= "/>";
            return s;
        }
        s+=">";
        for( int i = 0;i<childs.getLength();i++ )
            s+=serializeNode(childs.item(i));
        s+= "</"+node.getNodeName()+">";
        return s;
    }
Kryštof Hilar
  • 609
  • 2
  • 10
  • 22
0

To remove unneccesary tags probably such code can be used:

DOMConfiguration config = serializer.getDomConfig(); config.setParameter("canonical-form", true);

But it will not always work, because "canonical-form=true" is optional

Oleg Vazhnev
  • 23,239
  • 54
  • 171
  • 305
0

er... you could also call toString() and just chop off the beginning and end tags, either manually or using regexps.

edit: toString() doesn't do what I expected. Pulling out the O'Reilly Java & XML book talks about the Load and Save module of Java DOM.

See in particular the LSSerializer which looks very promising. You could either call writeToString(node) and chop off the beginning and end tags, as I suggested, or try to use LSSerializerFilter to not print the top node tags (not sure if that would work; I admit I've never used LSSerializer before.)

Reading the O'Reilly book seems to indicate doing something like this:

 DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
 DOMImplementationLS lsImpl = 
   (DOMImplementationLS)registry.getDOMImplementation("LS");
 LSSerializer serializer = lsImpl.createLSSerializer();
 String nodeString = serializer.writeToString(node);
Jason S
  • 184,598
  • 164
  • 608
  • 970
  • No? .toString() of my td-Node would just result in "[b: null]" –  Jan 27 '09 at 21:06
  • Hmm, I guess I got that confused with Javascript + e4x. I meant call the function which just produces the output, then delete the beginning/end tags. – Jason S Jan 27 '09 at 21:21
0

node.getTextContent();

You ought to be using JDom of Dom4J to handle nodes, if for no other reasons, to handle whitespace correctly.