Reading comment from XML using DOM parser

Question

When i try to read Comment's from XML file, Comment's from both the element are printing twice, when it pass through the loop. it should print first element comment in first iteration and second element comment in next iteration. If it is not clear, I have attached expected Output and Actual output for reference.

XML Code:

<shipments>
  <shipment id="011">
    <department>XXXX</department>
    <!--  Product: XXXXX-->
  </shipment>   
</shipments>

Code:

public class Main {
   public static void main(String[] args) throws SAXException,
    IOException, ParserConfigurationException, XMLStreamException {

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

      // Ignores all the comments described in the XML File
      factory.setIgnoringComments(false);
    DocumentBuilder builder = factory.newDocumentBuilder();

    Document doc = builder.parse(new File("Details.xml"));
    doc.getDocumentElement().normalize(); 

    NodeList ShipmentList = doc.getElementsByTagName("shipment");

    for (int i = 0; i < ShipmentList.getLength(); i++)
    {
     Node node = ShipmentList.item(i);
             if (node.getNodeType() == Node.ELEMENT_NODE)
     {
           Element eElement = (Element) node; 
        XMLStreamReader xr = XMLInputFactory.newInstance().createXMLStreamReader(new FileInputStream("shipmentDetails_1.xml")); 
         while (xr.hasNext()) {
             if (xr.next() == XMLStreamConstants.COMMENT) {                                     
                 String comment = xr.getText();         
                 System.out.print("Comments: ");
                 System.out.println(comment);

             } }


     }
    }
}

}

Expected Output:

COMMENTS : Product : Laptop

COMMENTS : Product : Mobile Phone

Output What i am getting:

Comments: Product:Laptop
Comments: Product:Mobile Phone

Hi @Vassan, can you please clarify your question? What is the problem? What output are you expecting vs what are you receiving? Why isn't the text `Product:Mobile Phone` part of the XML structure? — Jonathan Benn, Jun 10 '19 at 18:40
*"I don't have any clue about how to print [...] comment"* Similar to how you're printing elements, you write code to check for `node.getNodeType() == Node.COMMENT_NODE`, cast to a `Comment`, and print the value of `getData()`. — Andreas, Jun 10 '19 at 19:33
*"I don't have any clue about how to print XML declaration"* Then you should **read the documentation**, i.e. the javadoc of [`Document`](https://docs.oracle.com/javase/8/docs/api/org/w3c/dom/Document.html), which lists the following methods: `getXmlEncoding()`, `getXmlStandalone()`, ` getXmlVersion()` — Andreas, Jun 10 '19 at 19:39
@Vasan, you should put your `while` loop for printing the comments _outside_ of the `for` loop. It seems that the comments are read all at once by the parser as _unstructured data_. If you want to read the comments as structured data you will need to include them in the XML structure (_i.e._ they need an element tag!) — Jonathan Benn, Jun 12 '19 at 12:38

Andreas · Accepted Answer · 2019-06-11T17:33:36.037

To get the values from the XML declaration, call the following methods on the Document:

getXmlEncoding() - An attribute specifying, as part of the XML declaration, the encoding of this document. This is null when unspecified or when it is not known, such as when the Document was created in memory.
getXmlStandalone() - An attribute specifying, as part of the XML declaration, whether this document is standalone. This is false when unspecified.
getXmlVersion() - An attribute specifying, as part of the XML declaration, the version number of this document. If there is no declaration and if this document supports the "XML" feature, the value is "1.0".

UPDATED

To find and print comments inside the <shipment> element, iterate the child nodes of the element and look for nodes of type COMMENT_NODE, cast it to a Comment, and print the value of getData().

for (Node child = node.getFirstChild(); child != null; child = child.getNextSibling()) {
    if (child.getNodeType() == Node.COMMENT_NODE) {
        Comment comment = (Comment) child;
        System.out.println("COMMENTS : " + comment.getData());
    }
}

To clarify: The node used here is from the question code. You can also use eElement instead of node. Makes no difference.

Thanks @andreas. I was able to get the Encoding and Version.For Comment , I tried as you mentioned. But it didnt work — Kavin, Jun 10 '19 at 23:11
I did it in different way. I have posted Expected output and output which i am getting — Kavin, Jun 11 '19 at 02:58
@Vasan Answer was updated to correct find comments belonging to a given `` element. — Andreas, Jun 11 '19 at 17:22
I believe this answer is correct. But need to add that in the answer, Node child = node.getFirstChild() is used. 'node' there is really from node that was type casted to Element in previous statement. There is no need to read the xml again (especially inside the loop) and 'xr' should be dropped — Jayr, Jun 11 '19 at 17:30

score -1 · Answer 2 · answered Jun 10 '19 at 18:56

-1

To obtain the XML Declaration and comments, I would suggest loading the file as a text file and parsing it via regular expressions. For example:

    String file = new String(Files.readAllBytes(Paths.get("shipmentDetails_1.xml")), StandardCharsets.UTF_8);

    Pattern pattern = Pattern.compile("<!--([\\s\\S]*?)-->");
    Matcher matcher = pattern.matcher(file);
    while (matcher.find()) {
        System.out.println("COMMENTS: " + matcher.group(1));
    }

    Pattern pattern2 = Pattern.compile("<\\?xml([\\s\\S]*?)\\?>");
    Matcher matcher2 = pattern2.matcher(file);
    while (matcher2.find()) {
        System.out.println("DECLARATION: " + matcher2.group(1));
    }

answered Jun 10 '19 at 18:56

Jonathan Benn

2,908
4
24
28

Downvoted for suggesting "parsing [XML] via regular expressions". As bad as [parsing HTML](https://stackoverflow.com/a/1732454/5221149). – Andreas Jun 10 '19 at 19:35
@Andreas, I agree with you in the general case, but regular expressions can be a handy (and very quick) solution for certain edge cases. I probably should have read the documentation, but in this case I wasn't sure if it was even possible to get the parser to read comments. But from your answer it's clear that it's possible. – Jonathan Benn Jun 12 '19 at 12:30

Reading comment from XML using DOM parser

2 Answers2