Upon searching through existing CDATA discussions, none that I found were able to achieve what I'm attempting.
Is it possible to parse within CDATA where the tag is not unique?
Below is the XML document where I'm attempting to retrieve each field within the CDATA block that has multiple fields of interest (i.e. Data Loaded, Quality, Status, Index) on line 5 below. Each field is marked with the "li" tag within the CDATA block (even though it's a character data space):
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.0">
<Document>
<name>area Area Date: 2014-07-31</name>
<Placemark><name>P07L327</name><Point><coordinates>-96.26879,85.19125</coordinates></Point><description><![CDATA[<ol><li> Data Loaded: NO</li><li>Quality: 5</li><li>Status: UP</li><li>Index: 72</li></eol>]]></description><Style> id = "colorIcon"</Style></Placemark>
<coordinates>-96.26879,85.19125,0 -96.26879,85.19125,0 -96.26879,85.19125,0 -96.26879,85.19125,0 -96.26879,45.14698,0 </coordinates>
</Document>
</kml>
Currently output is like this:
Name: <ol><li> Data Loaded: NO</li><li>Quality: 5</li><li>Status: UP</li><li>Index: 72</li></eol>
From WITHIN the CDATA block, my intention is to output a new line for each field along with it's appropriate result.
Below is the code that's written up until now that gives the current output listed above:
package com.lucy.seo;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.CharacterData;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
import org.w3c.dom.CDATASection;
import org.w3c.dom.Comment;
import org.w3c.dom.Text;
import org.xml.sax.SAXException;
public class ReadXMLFile {
public static void main(String[] args ) throws Exception {
File fXmlFile = new File("C:/XML_UltraEdit/XML_Sandbox/Oracle_Java_Project/Test_Doc.xml");
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(fXmlFile);
doc.getDocumentElement().normalize();
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("Placemark");
System.out.println("----------------------------");
for (int temp = 0; temp < nList.getLength(); temp++) {
Element element = (Element) nList.item(temp);
NodeList name = element.getElementsByTagName("description");
Element line = (Element) name.item(0);
System.out.println("Name: " + getCharacterDataFromElement(line));
}
}
public static String getCharacterDataFromElement(Element f) {
NodeList list = f.getChildNodes();
String data;
for(int index = 0; index < list.getLength(); index++){
if(list.item(index) instanceof CharacterData){
CharacterData child = (CharacterData) list.item(index);
data = child.getData();
if(data != null && data.trim().length() > 0)
return child.getData();
}
}
return "";
}
}
Appreciate any help towards this! -- thanks!
Sep 2, 2014 update
Updated edit with final solution. Thank you to all here that posted solutions and helped. Solution was broken up into two pieces of code / files due to library conflicts:
//First file which is input to the second file followed afterwards
import java.io.*;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.CharacterData;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ReadXMLFile {
public static void main(String[] args ) throws Exception {
PrintStream out = new PrintStream(new FileOutputStream("C:/XML_UltraEdit/XML_Sandbox/NetBeans_Java_Project/temp_file.html"));
System.setOut(out);
File fXmlFile = new File("C:/XML_UltraEdit/XML_Sandbox/NetBeans_Java_Project/raw_input.xml");
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(fXmlFile);
//optional, but recommended
//read this - http://stackoverflow.com/questions/13786607/normalization-in-dom-parsing-with-java-how-does-it-work
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("Placemark");
//create a buffered reader that connects to the console, we use it so we can read lines
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
System.out.println("<html xlmns=http://www.w3.org/1999/xhtml>");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
Element eElement = (Element) nNode;
Element element = (Element) nList.item(temp);
NodeList name = element.getElementsByTagName("description");
Element line = (Element) name.item(0);
System.out.println("<bracket><li>Name: " + eElement.getElementsByTagName("name").item(0).getTextContent() + "</li>");
System.out.println("<description>Description: " + getCharacterDataFromElement(line) + "</description></bracket>");
}
System.out.println("</html>");
//read a line from the console
String lineFromInput = in.readLine();
//output to the file a line
out.println(lineFromInput);
out.close();
}
public static String getCharacterDataFromElement(Element f) {
NodeList list = f.getChildNodes();
String data;
for(int index = 0; index < list.getLength(); index++){
if(list.item(index) instanceof CharacterData){
CharacterData child = (CharacterData) list.item(index);
data = child.getData();
if(data != null && data.trim().length() > 0)
return child.getData();
}
}
return "";
}
}
//Second File
package ReadXMLFile_part2;
import java.io.*;
import org.jsoup.Jsoup;
import org.jsoup.select.Elements;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import java.util.logging.Level;
import java.util.logging.Logger;
public class ReadXMLFile_part2 {
public static void main(String[] args) throws Exception {
PrintStream out = new PrintStream(new FileOutputStream("C:/XML_UltraEdit/XML_Sandbox/NetBeans_Java_Project/PA-PTH013_Output_Meters.xml"));
System.setOut(out);
System.out.println("*** JSOUP ***");
File input = new File("C:/XML_UltraEdit/XML_Sandbox/NetBeans_Java_Project/temp_file.html");
Document doc = null;
try {
doc = Jsoup.parse(input,"UTF-8", "http://www.w3.org/1999/xhtml" );
} catch (IOException ex) {
Logger.getLogger(ReadXMLFile_part2.class.getName()).log(Level.SEVERE, null, ex);
}
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
Elements brackets = doc.getElementsByTag("bracket");
for (Element bracket : brackets) {
Elements lis = bracket.select("li");
for (Element li : lis){
System.out.println(li.text());
}
break;
}
System.out.println();
//read a line from the console
String lineFromInput = in.readLine();
//output to the file a line
out.println(lineFromInput);
out.close();
}
}