0

Face an issue in parsing XML to extract data from a specific node. I referred to Link1 Link2 Link3. Please note, am able to parse & get the data for other nodes in the below xml file like id, order_id etc. But for the below line / node, unable to extract the info of segment_id & instrument_id:

<trade segment_id="NSE-F&amp;O " instrument_id="NSE:INFRATEL17NOVFUT">

Not sure if the way the XML file is setup or the way I am trying to extract the data for that specific node is wrong. Hope the specific issue I face is clear.

XML File:

<contract_note version="0.1">
 <contracts>
  <contract>
   <id>CNT-17/18-5310750</id>
   <name>CONTRACT NOTE CUM BILL</name>
   <description>None</description>
   <timestamp>2017-11-01</timestamp>
   <trades>
      <trade segment_id="NSE-F&amp;O " instrument_id="NSE:INFRATEL17NOVFUT">
      <id>37513030</id>
      <order_id>1300000000352370</order_id>
      <timestamp>09:20:48</timestamp>
      <description>None</description>
      <type>buy</type>
      <quantity>1700</quantity>
      <average_price>444.2</average_price>
      <value>755140.0</value>
      </trade>
   </trades>
  </contract>
 </contracts>
</contract_note>

Code:

try {
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(xmlFile);

        NodeList cNoteList = doc.getElementsByTagName("contract");
        Node nNode = cNoteList.item(0);
        if (nNode.getNodeType() == Node.ELEMENT_NODE) {
                Element eElement = (Element) nNode;
        for (int j = 1; j <= eElement.getElementsByTagName("trade").getLength(); j++) {
// Check if data can be read for Node - 'id'    
System.out.println(eElement.getElementsByTagName("id").item(j).getTextContent();
// Check if data can be read for segment_id & instrument_id         
System.out.println("Scrip: " + eElement.getElementsByTagName("trade").item(0).getTextContent());
                }
                }catch (Exception e) {
                    e.printStackTrace();
                }

Edit: Corrected the xml file info provided above.

iCoder
  • 1,406
  • 6
  • 16
  • 35
  • 1
    The & character is what is causing troubble. It should be escaped. – Juan Nov 06 '17 at 13:12
  • 1
    After some back-and-forth with OP, real question turns out to be a basic "how do I read XML attribute values in Java" question. Closed as duplicate. – kjhughes Nov 06 '17 at 14:25
  • Thank you & my sincere apologies for not having searched the thread you cited. – iCoder Nov 06 '17 at 14:32

1 Answers1

1

As @Juan commented, your XML is bad. Fix it by following the required XML escaping rules and replacing segment_id="NSE-F&O " with segment_id="NSE-F&amp;O ".

If you cannot change the XML, then see How to parse invalid (bad / not well-formed) XML? for options, but the best option is to fix the XML at the source.

kjhughes
  • 106,133
  • 27
  • 181
  • 240
  • Thanks, I opened the xml in Notepad now & notice it as segment_id="NSE-F&O " So the file should be fine I presume. – iCoder Nov 06 '17 at 13:31
  • Confused by your comment: *Notice*? Did you not fix it? *Presume*? Have you not retested? – kjhughes Nov 06 '17 at 13:33
  • Sorry for confusion. I had opened the XML in IE, which shows as segment_id="NSE-F&O ". But if I open the same XML in Notepad it is shown as segment_id="NSE-F&O ". I have not made any changes to the XML file. But when I try to parse, am not able to read that data. Any other mistake which I am doing? – iCoder Nov 06 '17 at 13:38
  • Please add the error message and stack trace to your question, along with a proper [mcve]. – kjhughes Nov 06 '17 at 14:11
  • There is no error issued. But the output just reads as 'trade'. My expectation was it should read as 'trade segment_id="NSE-F&O " instrument_id="NSE:INFRATEL17NOVFUT" – iCoder Nov 06 '17 at 14:17
  • Use `getAttributes()` for attributes. Closing this question as a duplicate... – kjhughes Nov 06 '17 at 14:23