Java: Most efficient method to iterate over all elements in a org.w3c.dom.Document?

Question

What is the most efficient way to iterate through all DOM elements in Java?

Something like this but for every single DOM elements on current org.w3c.dom.Document?

for(Node childNode = node.getFirstChild(); childNode!=null;){
    Node nextChild = childNode.getNextSibling();
    // Do something with childNode, including move or delete...
    childNode = nextChild;
}

Recursive invocation of Node.getChildNodes? http://download.oracle.com/javase/6/docs/api/org/w3c/dom/Node.html#getChildNodes%28%29 — Vance Maverick, Mar 22 '11 at 05:38
I think it's interesting that the question asked the _most efficient_ method to iterate over all elements of a `Document`, but none of the answers did any tests of efficiency, and the only mention of efficiency was "I think" or similar surmises. — Garret Wilson, Mar 14 '20 at 20:21

score 136 · Accepted Answer · edited Dec 24 '22 at 00:35

Basically you have two ways to iterate over all elements:

1. Using recursion (the most common way I think):

public static void main(String[] args) throws SAXException, IOException,
        ParserConfigurationException, TransformerException {

    DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
        .newInstance();
    DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
    Document document = docBuilder.parse(new File("document.xml"));
    doSomething(document.getDocumentElement());
}

public static void doSomething(Node node) {
    // do something with the current node instead of System.out
    System.out.println(node.getNodeName());

    NodeList nodeList = node.getChildNodes();
    for (int i = 0; i < nodeList.getLength(); i++) {
        Node currentNode = nodeList.item(i);
        if (currentNode.getNodeType() == Node.ELEMENT_NODE) {
            //calls this method for all the children which is Element
            doSomething(currentNode);
        }
    }
}

2. Avoiding recursion using getElementsByTagName() method with * as parameter:

public static void main(String[] args) throws SAXException, IOException,
        ParserConfigurationException, TransformerException {

    DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
            .newInstance();
    DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
    Document document = docBuilder.parse(new File("document.xml"));
    
    NodeList nodeList = document.getElementsByTagName("*");
    for (int i = 0; i < nodeList.getLength(); i++) {
        Node node = nodeList.item(i);
        if (node.getNodeType() == Node.ELEMENT_NODE) {
            // do something with the current element
            System.out.println(node.getNodeName());
        }
    }
}

I think these ways are both efficient.

Passing the iteration index as an argument to the recursive function, you can made it tail-recursive, which is optimized by compiler, to avoid stack overflow. — khachik, Apr 03 '11 at 18:30
I think it's too late to avoid stack overflow. You're already here. — braden, Sep 28 '12 at 17:19
What makes you think that the creation of a node list for the whole document is efficient? This means almost copying the whole document. Or is there some kind of delayed evaluation hidden in `NodeList` optimizing sequential calls to `item`? — ceving, Mar 07 '13 at 18:17
@ceving NodeList is an interface. Implementations are free to do advanced things. The item(n) implementation in org.apache.xerces.dom.ParentNode includes a cache but it is used to speed up the lookup, not to save memory. — Ryan, Aug 20 '14 at 18:32
Go with answer #2 but change the for loop to read: for (int i = 0, len = nodeList.getLength(); i < len; i++) — Andrew, May 24 '18 at 05:39

Andrew · Answer 2 · 2018-05-24T05:38:41.517

37

for (int i = 0; i < nodeList.getLength(); i++)

change to

for (int i = 0, len = nodeList.getLength(); i < len; i++)

to be more efficient.

The second way of javanna answer may be the best as it tends to use a flatter, predictable memory model.

edited May 24 '18 at 05:38

answered Oct 04 '12 at 21:28

Andrew

761
7
9

1

You need at least 50 rep score to comment. I had the same problem and answered because I couldn't comment. Have some upvote-aid ;) – nyaray Jul 11 '13 at 16:54
The avoiding recursion solution above prevents the program from using more stack memory based on the data. Each step in recursion pushes more data into the stack. – Andrew Mar 30 '18 at 14:57

mike · Answer 3 · 2014-02-05T12:10:13.230

I also stumbled over this problem recently. Here is my solution. I wanted to avoid recursion, so I used a while loop.

Because of the adds and removes in arbitrary places on the list, I went with the LinkedList implementation.

/* traverses tree starting with given node */
  private static List<Node> traverse(Node n)
  {
    return traverse(Arrays.asList(n));
  }

  /* traverses tree starting with given nodes */
  private static List<Node> traverse(List<Node> nodes)
  {
    List<Node> open = new LinkedList<Node>(nodes);
    List<Node> visited = new LinkedList<Node>();

    ListIterator<Node> it = open.listIterator();
    while (it.hasNext() || it.hasPrevious())
    {
      Node unvisited;
      if (it.hasNext())
        unvisited = it.next();
      else
        unvisited = it.previous();

      it.remove();

      List<Node> children = getChildren(unvisited);
      for (Node child : children)
        it.add(child);

      visited.add(unvisited);
    }

    return visited;
  }

  private static List<Node> getChildren(Node n)
  {
    List<Node> children = asList(n.getChildNodes());
    Iterator<Node> it = children.iterator();
    while (it.hasNext())
      if (it.next().getNodeType() != Node.ELEMENT_NODE)
        it.remove();
    return children;
  }

  private static List<Node> asList(NodeList nodes)
  {
    List<Node> list = new ArrayList<Node>(nodes.getLength());
    for (int i = 0, l = nodes.getLength(); i < l; i++)
      list.add(nodes.item(i));
    return list;
  }

Java: Most efficient method to iterate over all elements in a org.w3c.dom.Document?

3 Answers3

Linked