3

When i try to read Comment's from XML file, Comment's from both the element are printing twice, when it pass through the loop. it should print first element comment in first iteration and second element comment in next iteration. If it is not clear, I have attached expected Output and Actual output for reference.

XML Code:

<shipments>
  <shipment id="011">
    <department>XXXX</department>
    <!--  Product: XXXXX-->
  </shipment>   
</shipments>

Code:

public class Main {
   public static void main(String[] args) throws SAXException,
    IOException, ParserConfigurationException, XMLStreamException {

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

      // Ignores all the comments described in the XML File
      factory.setIgnoringComments(false);
    DocumentBuilder builder = factory.newDocumentBuilder();

    Document doc = builder.parse(new File("Details.xml"));
    doc.getDocumentElement().normalize(); 

    NodeList ShipmentList = doc.getElementsByTagName("shipment");

    for (int i = 0; i < ShipmentList.getLength(); i++)
    {
     Node node = ShipmentList.item(i);
             if (node.getNodeType() == Node.ELEMENT_NODE)
     {
           Element eElement = (Element) node; 
        XMLStreamReader xr = XMLInputFactory.newInstance().createXMLStreamReader(new FileInputStream("shipmentDetails_1.xml")); 
         while (xr.hasNext()) {
             if (xr.next() == XMLStreamConstants.COMMENT) {                                     
                 String comment = xr.getText();         
                 System.out.print("Comments: ");
                 System.out.println(comment);

             } }


     }
    }
}

}

Expected Output:

COMMENTS : Product : Laptop

COMMENTS : Product : Mobile Phone

Output What i am getting:

Comments: Product:Laptop
Comments: Product:Mobile Phone

Comments: Product:Laptop
Comments: Product:Mobile Phone

StealthRT
  • 10,108
  • 40
  • 183
  • 342
Kavin
  • 43
  • 8
  • 1
    Hi @Vassan, can you please clarify your question? What is the problem? What output are you expecting vs what are you receiving? Why isn't the text `Product:Mobile Phone` part of the XML structure? – Jonathan Benn Jun 10 '19 at 18:40
  • *"I don't have any clue about how to print [...] comment"* Similar to how you're printing elements, you write code to check for `node.getNodeType() == Node.COMMENT_NODE`, cast to a `Comment`, and print the value of `getData()`. – Andreas Jun 10 '19 at 19:33
  • *"I don't have any clue about how to print XML declaration"* Then you should **read the documentation**, i.e. the javadoc of [`Document`](https://docs.oracle.com/javase/8/docs/api/org/w3c/dom/Document.html), which lists the following methods: `getXmlEncoding()`, `getXmlStandalone()`, ` getXmlVersion()` – Andreas Jun 10 '19 at 19:39
  • @Vasan, you should put your `while` loop for printing the comments _outside_ of the `for` loop. It seems that the comments are read all at once by the parser as _unstructured data_. If you want to read the comments as structured data you will need to include them in the XML structure (_i.e._ they need an element tag!) – Jonathan Benn Jun 12 '19 at 12:38

2 Answers2

1

To get the values from the XML declaration, call the following methods on the Document:

  • getXmlEncoding() - An attribute specifying, as part of the XML declaration, the encoding of this document. This is null when unspecified or when it is not known, such as when the Document was created in memory.

  • getXmlStandalone() - An attribute specifying, as part of the XML declaration, whether this document is standalone. This is false when unspecified.

  • getXmlVersion() - An attribute specifying, as part of the XML declaration, the version number of this document. If there is no declaration and if this document supports the "XML" feature, the value is "1.0".


UPDATED

To find and print comments inside the <shipment> element, iterate the child nodes of the element and look for nodes of type COMMENT_NODE, cast it to a Comment, and print the value of getData().

for (Node child = node.getFirstChild(); child != null; child = child.getNextSibling()) {
    if (child.getNodeType() == Node.COMMENT_NODE) {
        Comment comment = (Comment) child;
        System.out.println("COMMENTS : " + comment.getData());
    }
}

To clarify: The node used here is from the question code. You can also use eElement instead of node. Makes no difference.

Andreas
  • 154,647
  • 11
  • 152
  • 247
  • Thanks @andreas. I was able to get the Encoding and Version.For Comment , I tried as you mentioned. But it didnt work – Kavin Jun 10 '19 at 23:11
  • I did it in different way. I have posted Expected output and output which i am getting – Kavin Jun 11 '19 at 02:58
  • @Vasan Answer was updated to correct find comments belonging to a given `` element. – Andreas Jun 11 '19 at 17:22
  • 1
    I believe this answer is correct. But need to add that in the answer, Node child = node.getFirstChild() is used. 'node' there is really from node that was type casted to Element in previous statement. There is no need to read the xml again (especially inside the loop) and 'xr' should be dropped – Jayr Jun 11 '19 at 17:30
-1

To obtain the XML Declaration and comments, I would suggest loading the file as a text file and parsing it via regular expressions. For example:

    String file = new String(Files.readAllBytes(Paths.get("shipmentDetails_1.xml")), StandardCharsets.UTF_8);

    Pattern pattern = Pattern.compile("<!--([\\s\\S]*?)-->");
    Matcher matcher = pattern.matcher(file);
    while (matcher.find()) {
        System.out.println("COMMENTS: " + matcher.group(1));
    }

    Pattern pattern2 = Pattern.compile("<\\?xml([\\s\\S]*?)\\?>");
    Matcher matcher2 = pattern2.matcher(file);
    while (matcher2.find()) {
        System.out.println("DECLARATION: " + matcher2.group(1));
    }
Jonathan Benn
  • 2,908
  • 4
  • 24
  • 28
  • Downvoted for suggesting "parsing [XML] via regular expressions". As bad as [parsing HTML](https://stackoverflow.com/a/1732454/5221149). – Andreas Jun 10 '19 at 19:35
  • @Andreas, I agree with you in the general case, but regular expressions can be a handy (and very quick) solution for certain edge cases. I probably should have read the documentation, but in this case I wasn't sure if it was even possible to get the parser to read comments. But from your answer it's clear that it's possible. – Jonathan Benn Jun 12 '19 at 12:30