I have some kind of complex XML data structure. The structure contains different fragments like in the following example:
<data>
<content-part-1>
<h1>Hello <strong>World</strong>. This is some text.</h1>
<h2>.....</h2>
</content-part1>
....
</data>
The h1 tag within the tag 'content-part-1' is of interest. I want to get the full content of the xml tag 'h1'.
In java I used the javax.xml.parsers.DocumentBuilder and tried something like this:
String my_content="<h1>Hello <strong>World</strong>. This is some text.</h1>";
// parse h1 tag..
DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = documentBuilder.parse(new InputSource(new StringReader(my_content)));
Node node = doc.importNode(doc.getDocumentElement(), true);
if (node != null && node.getNodeName().equals("h1")) {
return node.getTextContent();
}
But the method 'getTextContent()' will return:
Hello World. This is some text.
The tag "strong" is removed by the xml parser (as it is the documented behavior).
My question is how I can extract the full content of a single XML Node within a org.w3c.dom.Document without any further parsing the node content?