I am trying to parse a XML file with HTML strings using DOMParser. The problem is that the getTextContent() method gets only the texts but not any HTML tags in it. I expect the string to be returned as it is rather than the parsed version. I searched the whole web and I couldn’t find anything that helps me. Btw. I cannot make any changes to the HTML strings since there are more than 100k stings spanning across around 500 files.
Test.xml file
<?xml version="1.0" encoding="iso-8859-1"?>
<UserDetails xml:lang="en">
<UserMessage ID="TestID">Text goes here. <span style="color:#DF0000"><b>Bold Text goes here.</b> </span>More Text.</UserMessage>
</UserDetails>
Java module
import com.sun.org.apache.xerces.internal.parsers.DOMParser;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
public class TestAll
{
public static void main(String[] args)
{
try
{
File file = new File("C:/Users/Administrator/Desktop/Test.xml");
DOMParser fileParser = new DOMParser();
InputStream in = new FileInputStream(file);
InputSource source = new InputSource(in);
fileParser.parse(source);
in.close();
Document newFileDoc = fileParser.getDocument();
NodeList nodes = newFileDoc.getChildNodes();
for (int i = 0; i < nodes.getLength(); i++)
{
Node node = nodes.item(i);
NodeList userMessages = node.getChildNodes();
for (int j = 0; j < userMessages.getLength(); j++)
{
Node userMessage = userMessages.item(j);
if (userMessage.getNodeType() == Node.ELEMENT_NODE)
{
String text = userMessage.getTextContent();
System.out.println(text);
}
}
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
Actual Output
Text goes here. Bold Text goes here. More Text.
Expected Output
Text goes here. <span style="color:#DF0000"><b>Bold Text goes here.</b> </span>More Text.
Any help would be appreciated.