5

You can call Nokogiri::XML::Node#ancestors.size to see how deeply a node is nested. But is there a way to determine how deeply nested the most deeply nested child of a node is?

Alternatively, how can you find all the leaf nodes that descend from a node?

Phrogz
  • 296,393
  • 112
  • 651
  • 745
dan
  • 43,914
  • 47
  • 153
  • 254
  • Good question, +1. See my answer for a description of an XPath 1.0 -based solution and a single-XPath-2.0-expression solution. – Dimitre Novatchev Apr 17 '11 at 17:13
  • Related: [How to select all leaf nodes using XPath expression?](http://stackoverflow.com/questions/3926589/how-to-select-all-leaf-nodes-using-xpath-expression) – Phrogz Apr 17 '11 at 23:40

2 Answers2

2

The following code monkey-patches Nokogiri::XML::Node for fun, but of course you can extract them as individual methods taking a node argument if you like. (Only the height method is part of your question, but I thought the deepest_leaves method might be interesting.)

require 'nokogiri'
class Nokogiri::XML::Node
  def depth
    ancestors.size
    # The following is ~10x slower: xpath('count(ancestor::node())').to_i
  end
  def leaves
    xpath('.//*[not(*)]').to_a
  end
  def height
    tallest = leaves.map{ |leaf| leaf.depth }.max
    tallest ? tallest - depth : 0
  end
  def deepest_leaves
    by_height = leaves.group_by{ |leaf| leaf.depth }
    by_height[ by_height.keys.max ]
  end
end

doc = Nokogiri::XML "<root>
  <a1>
    <b1></b1>
    <b2><c1><d1 /><d2><e1 /><e2 /></d2></c1><c2><d3><e3/></d3></c2></b2>
  </a1>
  <a2><b><c><d><e><f /></e></d></c></b></a2>
</root>"

a1 = doc.at_xpath('//a1')
p a1.height                      #=> 4
p a1.deepest_leaves.map(&:name)  #=> ["e1", "e2", "e3"]
p a1.leaves.map(&:name)          #=> ["b1", "d1", "e1", "e2", "e3"]

Edit: To answer just the question asked tersely, without wrapping it in re-usable pieces:

p a1.xpath('.//*[not(*)]').map{ |n| n.ancestors.size }.max - a1.ancestors.size
Phrogz
  • 296,393
  • 112
  • 651
  • 745
  • That's more than three times longer and many more times more-complex than the XPath 2.0 single expression! :) Very good example of the advantages of using pure XPath solutions over a PL, even if that happens to be Ruby. – Dimitre Novatchev Apr 18 '11 at 04:34
  • @Dimitre Very true. (Well...there's more in that code than is needed for just the same logic, but that's beside the point.) It will be very nice when libxml2/Nokogiri support XPath 2.0 expressions. Your excellent, terse, and correct answer just happens not to work for Nokogiri, as asked by the OP. – Phrogz Apr 18 '11 at 05:50
  • @Dimitre Actually, not true; per my edit, with almost no golfing, the Ruby code is about 1/3 less bytes than XPath 2.0. *shrug* – Phrogz Apr 18 '11 at 06:00
  • Much better -- on the 2nd attempt -- and if the programmer (even one like you) doesn't achieve this terseness on the first attempt, then they should really just use the XPath expression. It is a relief for us, XSLT/XPath people not to depend on other PLs that need unnatural mental efforts in order to achieve some degree of resemblance to the natural ellegance of XPath. – Dimitre Novatchev Apr 18 '11 at 12:50
  • See my update for an even shorter, one-liner XPath expression, shorter than yours :) BTW, isn't it time to reverse your downvote? :) – Dimitre Novatchev Apr 26 '22 at 02:35
  • @DimitreNovatchev Nice! Also, I don’t know when I did it, but my current vote on your post is up, not down. – Phrogz Apr 26 '22 at 02:51
  • Sorry for even thinking this might be you. Please, forget it. And this isn't really important! BTW, there is now SaxonC, implementing even XSLT 3.0 / XPath 3.1. It can be used from Python, maybe from Ruby, too? – Dimitre Novatchev Apr 26 '22 at 04:15
1

You can call Nokogiri::XML::Node#ancestors.size to see how deeply a node is nested. But is there a way to determine how deeply nested the most deeply nested child of a node is?

Use:

count(ancestor::node())

This expression expresses the number of ancesstors the context (current) node has in the document hierarchy.

To find the nesting level of the "most deeply nested child" one must first determine all "leaf" nodes:

descendant-or-self::node()[not(node())]

and for each of them get their nesting level using the above XPath expression.

Then the maximum nesting level has to be calculated (the maximum of all numbers produced ), and this last calculation is not possible with pure XPath 1.0.

This is possible to express in a single XPath 2.0 expression:

max(for $leaf in /descendant-or-self::node()[not(node())],
        $depth in count($leaf/ancestor::node())
      return
        $depth
    )

Update:

It is possible to shorten this XPath 2.0 expression even more:

max(/descendant-or-self::node()[not(node())]/count(ancestor::node()))
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • Is Nokogiri currently capable of calculating this XPath 2.0 expression? – dan Apr 17 '11 at 18:07
  • @dan I don't know what Nokogiri is exactly. If it is just an XPath 1.0 engine, then it by itself cannot calculate the needed maximum. In case Nokogiri is an API to be used within a specific programming language, then this can be done by a program written in this specific programming language and using Nokogiri to evaluate the nesting level of each leaf node. – Dimitre Novatchev Apr 17 '11 at 18:26
  • @dan Don't be intimidated by Dimitre's high rep and impressive knowledge of XPath; if he didn't answer your question, don't accept it! :) _With **much** due respect to Dimitre for his knowledge and assistance._ – Phrogz Apr 17 '11 at 23:23
  • BTW, congratulations on 45k, Dimitre! :) – Phrogz Apr 17 '11 at 23:53
  • @dan: Of course, @Phrogz is right that you have the freedom to always accept the best and most useful answer. – Dimitre Novatchev Apr 18 '11 at 04:28
  • @Phrogz: You are implying that I did't answer @dan's question. This is not true. 1. I have provided an XPath 1.0 expression that he can use immediately to get the answer of one of his questions `"how can you find all the leaf nodes that descend from a node"`. 2. I have given him an XPath 2.0 expression that answers his other question. 3. I have explained to him how to combine the 1st XPath 1.0 expression with his programming language in order to construct the complete solution, in case he cannot use XPath 2.0. Add the upvotes received -- what better confirmation of the value of this answer? – Dimitre Novatchev Apr 18 '11 at 13:00
  • @Dimitre You appear to be correct, as you have the upvotes and the acceptance mark. In my opinion if someone asks "How can I do _x_ in _language/tool y_?" then an answer which says "I don't know _language/tool y_, but here are a few steps that might help" is helpful, but does not answer the question. That, combined with @dan asking about whether or not XPath 2.0 works (it does not, for his needs) led me to believe that he did not have a working solution. – Phrogz Apr 18 '11 at 13:08
  • @Phrogz: Agreed, and no offense taken. Many questions don't receive the "ideal" answer the asker was looking for, but enough information for him to solve his problem. It is incorrect to belittle the value of such answers. – Dimitre Novatchev Apr 18 '11 at 13:26
  • @Dimitre I am glad that no offense was taken. In retrospect, I can see how my comment would easily be interpreted in a belittling manner. My intent was lighthearted humor, but a small :) is sometimes not enough for certain forms of commentary. – Phrogz Apr 18 '11 at 13:40