How to read XML file using java

Question

I am trying to read in some data from an XML file and having some trouble, the XML I have is as follows:

 <Tree>
  <child>
   <Property Name="id"/>
   <Property Name="username">abc</Property>
   <Property Name="phoneType">phone1</Property>
   <Property Name="value">123456</Property>
   </child>
   <child>
   <Property Name="id"/>
   <Property Name="username">def</Property>
   <Property Name="phoneType">phone2</Property>
   <Property Name="value">6789012</Property>
   </child>
   </Tree>

I am trying to read these values as strings into my Java program, I have written this code so far:

File fXmlFile = new File("C:\\Users\\welcome\\Downloads\\ta\\abc.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();

System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("child");
System.out.println("----------------------------");

for (int temp = 0; temp < nList.getLength(); temp++) {
    Node nNode = nList.item(temp);
    System.out.println("\nCurrent Element :" + nNode.getNodeName());
    if (nNode.getNodeType() == Node.ELEMENT_NODE) {
        Element eElement = (Element) nNode;
        System.out.println("id id : "
                           + eElement.getAttribute("id"));

I am struggling to read and print the values of id, username etc.

`NodeList nList = doc.getElementsByTagName("Object")` - you don't have any elements called `Object`... they're called `Property`. Likewise you don't have any attributes called `id`. The only attributes in your document are called `Name`. — Jon Skeet, Jan 17 '22 at 16:43
`getAttribute("id")` isn't correct. Your attribute is `Name` with a value of id. You're also skipping the child nodes. Not sure if that's intentional — OneCricketeer, Jan 17 '22 at 16:44

score 0 · Answer 1 · answered Jan 17 '22 at 20:41

0

I recommend you to use a library like jsoup for reading XML files, since you get a lot of functionality out of the box.

Also read: How to parse XML with jsoup

answered Jan 17 '22 at 20:41

gru

2,319
6
24
39

score 0 · Answer 2 · answered Jan 17 '22 at 22:07

You can use XPath to run quite different queries.

File fXmlFile = new File("C:\\Users\\welcome\\Downloads\\ta\\abc.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);

//Get an XPath object and evaluate the expression
XPath xpath = XPathFactory.newInstance().newXPath();
String propertyId = xpath.evaluate("/Tree/child[1]/Property[@name='id']", document);

More likely you want to loop over all the child elements, which can be done like

NodeList children = (NodeList)xpath.evaluate("/Tree/child", doc, XPathConstants.NODESET);
for (int i=0;i<children.getLength();i++) {
    Element child = (Element)children.get(i);
    String propertyId = child.getAttribute("id");
    ...
}

hfontanez · Answer 3 · 2022-01-25T12:40:00.990

Reading XML files is not a trivial task by any means. It requires the reader to have intimate knowledge of the structure of the file. By that, I mean, what are the element names, the attribute names, the data type of the attributes, the order of the elements, whether the elements are simple of complex (meaning they are flat or have nested elements underneath).

One solution, as shown by Jon Skeet's comment, is to use Java Document API. This interface has all the methods you will need to get data from an XML file. However, in my opinion, this still leaves the reader with the task of knowing the element and attribute names.

If a XML schema (XSD) or Document Type Definition (DTD) for a given XML is available or can be easily constructed, I prefer to use one of the many libraries to parse XML contents; to name a few StaX, JDOM, DOM4j, JAXB. Because I have used it extensively, I prefer JAXB. There are some limitations to JAXB, but those are out of scope for this discussion. One thing worth mentioning is that JAXB is included in Java distributions from Java 6 to 10. Outside of those versions, you must download the JAXB distribution yourself.

One of the primary reasons I used JAXB is that I can use annotations in POJOs to structure a class according to existing XMLs without needing to build a schema. Of course, this is not always simple to do. It is almost always compile your JAXB classes according to a schema. Because this will produce Java custom classes for your XML documents, you can call elements an attributes by their getter methods, rather than putting the burden on the reader to know the element names.

I used the OPs XML file to generate a schema using XML Copy Editor. The resulting schema looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
  <xs:element name="Tree">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="child" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="child">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Property" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="Property">
    <xs:complexType mixed="true">
      <xs:attribute name="Name" type="xs:string" use="required"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

Once you have the schema, you can use it to compile the JAXB classes using the XJC compiler that comes with Java. Here is an example on how to compile the JAXB classes: https://docs.oracle.com/javase/tutorial/jaxb/intro/examples.html

To download the JAXB compiler, go to https://javaee.github.io/jaxb-v2/ and click "Download standalone distribution". You can place the contents of the ZIP file anywhere on your computer. Then, simply set JAXB_HOME on your environment variables and you are set. This might seem like a lot of work but up to this point, these are one-time activities. The upside is when you have your environment set up, it will literally take you seconds to compile all your classes; even if you need to generate the schema based on your XML.

Executing the compiler generated Tree.java, Child.java, and Property.java.

Tree.java

@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "", propOrder = {
    "child"
})
@XmlRootElement(name = "Tree")
public class Tree {

    @XmlElement(required = true)
    protected List<Child> child;

    public List<Child> getChild() {
        if (child == null) {
            child = new ArrayList<Child>();
        }
        return this.child;
    }
}

Child.java

@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "", propOrder = {
    "property"
})
@XmlRootElement(name = "child")
public class Child {

    @XmlElement(name = "Property", required = true)
    protected List<Property> property;

    public List<Property> getProperty() {
        if (property == null) {
            property = new ArrayList<Property>();
        }
        return this.property;
    }
}

Property.java

@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "", propOrder = {
    "content"
})
@XmlRootElement(name = "Property")
public class Property {

    @XmlValue
    protected String content;
    @XmlAttribute(name = "Name", required = true)
    protected String name;

    public String getContent() {
        return content;
    }

    public void setContent(String value) {
        this.content = value;
    }

    public String getName() {
        return name;
    }

    public void setName(String value) {
        this.name = value;
    }
}

How to use these classes

The reading process (unmarshaling) converts the XML file into these generated data types. The JAXB unmarshaling process uses the JAXBContext utility class to create an unmarshaler and then call the unmarshal method to convert the XML file into objects:

JAXBContext context = JAXBContext.newInstance(Tree.class); // the argument is the root node
Tree xmlDoc = (Tree) context.createUnmarshaller().unmarshal(new FileReader("abc.xml")); // Reads the XML and returns a Java object

To write, you will use the Java classes to store the data and create the structure. In this case, you will need to create the required Property objects, the Child container for the property elements, and the root node which is the Tree node. You can add elements one at a time or create a list of them and add them all at once. Once the root node object is populated, simply pass it to the marshaler...

JAXBContext context = JAXBContext.newInstance(Tree.class);
Marshaller mar= context.createMarshaller();
mar.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE); // formatting the xml file
mar.marshal(tree, new File("abc.xml")); // saves the "Tree" object as "abc.xml"

All together

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.List;

import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Marshaller;

public class JAXBDemo {
    public static void main(String[] args) {
        try {
            // write
            Tree tree = new Tree();
            Property prop0 = new Property();
            prop0.setName("id");
            prop0.setContent("");
            
            Property prop1 = new Property();
            prop1.setName("username");
            prop1.setContent("abc");
            
            Property prop2 = new Property();
            prop2.setName("phoneType");
            prop2.setContent("phone1");

            Property prop3 = new Property();
            prop3.setName("value");
            prop3.setContent("123456");

            List<Property> props1 = List.of(prop0, prop1, prop2, prop3);

            Property prop4 = new Property();
            prop4.setName("id");
            prop4.setContent("");
            
            Property prop5 = new Property();
            prop5.setName("username");
            prop5.setContent("def");
            
            Property prop6 = new Property();
            prop6.setName("phoneType");
            prop6.setContent("phone2");

            Property prop7 = new Property();
            prop7.setName("value");
            prop7.setContent("6789012");

            List<Property> props2 = List.of(prop4, prop5, prop6, prop7);
            
            Child child1 = new Child();
            Child child2 = new Child();
            
            child1.getProperty().addAll(props1);
            child2.getProperty().addAll(props2);
            
            tree.getChild().add(child1);
            tree.getChild().add(child2);

            JAXBContext context = JAXBContext.newInstance(Tree.class);
            Marshaller mar= context.createMarshaller();
            mar.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
            mar.marshal(tree, new File("abc.xml"));

            // read
            Tree xmlDoc = (Tree) context.createUnmarshaller().unmarshal(new FileReader("abc.xml"));
            List<Child> children = xmlDoc.getChild();
            int i = 1;
            for (Child child : children) {
                System.out.println("Property " + i++ + ":");
                List<Property> props = child.getProperty();
                for (Property prop : props) {
                    System.out.println("Name: " + prop.getName() + "; Content: " + prop.getContent());
                }
            }
        } catch (JAXBException | FileNotFoundException e) {

            e.printStackTrace();
        }
    }
}

Last notes:

To get this to work, I had to make some "fixes" to the distribution. The first fix was to edit the xjc.bat according to this post: https://github.com/eclipse-ee4j/jaxb-ri/issues/1321. Scroll to the bottom to see the fix I applied.

Then, I needed to update my "jaxb-runtime" dependency to version 2.3.3 in order for the project to work with "jaxb-api" version 2.3.1.

Your preamble is not true. Reading XML just requires syntax. What you mention is about understanding XML and ways to detect errors early (grammar, semantic). Not all tools require this. — Queeg, Jan 25 '22 at 07:02
@HiranChaudhuri Your post itself proved my point. How can you get the value of the `id` attribute if you don't know 1) Some node contains such attribute, and 2) you know how to get to that element? That requires intimate knowledge of the XML syntax and structure. Sorry, but your comment is incorrect. — hfontanez, Jan 25 '22 at 12:38
To stay with your example: not all tools need to access the id attribute. And not all projects are big enough or strict enough to justify usage of XSD and persistence frameworks. — Queeg, Jan 25 '22 at 13:30
@HiranChaudhuri both of those statements are irrelevant. You still need to know that there is such `Tree` (root) node and that its child node is a list of `child` elements. So on and so forth. — hfontanez, Jan 25 '22 at 13:52
@HiranChaudhuri Regardless of the size of a team, an XSD or DTD makes consuming an XML document easier because they serve as a contract between consumers and producers. With schemas, you know the structure an syntax of the document, but you also know the data type of the leaf nodes and any constraints that might exist (i.e. which nodes are optional). This task is way harder to figure out using plain Document readers. — hfontanez, Jan 25 '22 at 14:01