1

I am trying to parse XML file in Java and it works just fine, but I do not really get why. I have the following code (I just snipped important things):

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();

Document document = builder.parse(new File(fileName));

NodeList nodeList = document.getDocumentElement().getChildNodes();

for (int i = 0; i < nodeList.getLength(); i++)
  {
   Node node = nodeList.item(i);

   if (node.getNodeType() == Node.ELEMENT_NODE) {
   Element elem = (Element) node;

   // Get the value of all sub-elements.
   String original = elem.getElementsByTagName("Original")
         .item(0).getChildNodes().item(0).getNodeValue();

   String translation = elem.getElementsByTagName("Translation").item(0)
         .getChildNodes().item(0).getNodeValue();

   Integer score = Integer.parseInt(elem.getElementsByTagName("Score")
         .item(0).getChildNodes().item(0).getNodeValue());
}

My XML is simple one:

<?xml version="1.0" encoding="UTF-8"?>
    <Dictionary>
         <Word>
              <Original>die Unterwäsche</Original >
              <Translation>Bielizna</Translation>
              <Score>-4</Score>
         </Word>
         <Word>
              <Original>die Müche</Original>
              <Translation>Fatyga, trud</Translation>
              <Score>0</Score>
         </Word>
         <Word>
              <Original>wetten</Original>
              <Translation>założyć się</Translation>
              <Score>-6</Score>
         </Word>
         <Word>
              <Original>umsonst</Original>
              <Translation>Bez powodu</Translation>
              <Score>0</Score>
         </Word>
    </Dictionary>

Big question is: why I have 9 nodes when calling nodeList.getLength() ? I printed them and 4 are elements (it seems fine) and 5 others are text nodes, but I do not really get what they are. And why is Node casted on Element?

Second thing is this part:

elem.getElementsByTagName("Score")
         .item(0).getChildNodes().item(0).getNodeValue());

I am calling item(0) on a found node, but again, what is it practically?

I would really appreciate your help, I am quite beginner and I am struggling with it for a while now. Posting step-by-step guide what is what with parts of my XML listed would mean a world to me.

KKeff
  • 348
  • 3
  • 12

1 Answers1

1

why I have 9 nodes when calling nodeList.getLength() ?

The 9 nodes are:

1 of <Document>
4 of <Word>
4 of Everything between <Word>

5 others are text nodes, but I do not really get what they are

<?xml version="1.0" encoding="UTF-8"?>
<Dictionary>                         <-- null text
    <Word>                           <-- null text
        <Original>...
        <Translation>...
        <Score>...
    </Word>
    <Word>                           <-- null text
        <Original>...
        <Translation>...
        <Score>...
    </Word>
    <Word>                           <-- null text
        <Original>...
        <Translation>...
        <Score>...
    </Word>
    <Word>                           <-- null text
        <Original>...
        <Translation>...
        <Score>...
    </Word>
</Dictionary>

And why is Node casted on Element?

To answer this last part, I refer you to another post: What's the difference between an element and a node in XML?

Community
  • 1
  • 1
ThisClark
  • 14,352
  • 10
  • 69
  • 100
  • Thanks @ThisClark, that is clear now, one more thing, I noticed, that instead of using this: `(elem.getElementsByTagName("Score") .item(0).getChildNodes().item(0).getNodeValue());)` I can use: `(elem.getElementsByTagName("Score") .item(0).getTextContent());)` And it works. Is it still all right? – KKeff May 13 '15 at 19:03
  • If it works for your needs, it is absolutely alright. If you are looking for better ways to work with XML, you'll have to read more about the topic and practice different approaches until you are more comfortable. Look at implementations in SAX and StAX. – ThisClark May 13 '15 at 21:12