1

I know that there are dozens of ways to select the first child element in Nokogiri, but which is the cheapest?

I can't get around using Node#children, which sounds awfully expensive. If there are 10,000 child nodes, and I don't want to touch the 9,999 others....

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Steinbitglis
  • 2,482
  • 2
  • 27
  • 40

3 Answers3

2

You can try it yourself and benchmark the result.

I created a quick benchmark: http://gist.github.com/283825

$ ruby test.rb 
Rehearsal ---------------------------------------------------
xpath/first()     3.290000   0.030000   3.320000 (  3.321197)
xpath.first       3.360000   0.010000   3.370000 (  3.381171)
at                4.540000   0.020000   4.560000 (  4.564249)
at_xpath          3.420000   0.010000   3.430000 (  3.430933)
children.second   0.220000   0.010000   0.230000 (  0.233090)
----------------------------------------- total: 14.910000sec

                      user     system      total        real
xpath/first()     3.280000   0.000000   3.280000 (  3.288647)
xpath.first       3.350000   0.020000   3.370000 (  3.374778)
at                4.530000   0.040000   4.570000 (  4.580512)
at_xpath          3.410000   0.010000   3.420000 (  3.421551)
children.second   0.220000   0.010000   0.230000 (  0.226846)

From my tests, children appears to be the fastest method.

Simone Carletti
  • 173,507
  • 49
  • 363
  • 364
  • 1
    The four first approaches you did uses xpath, which is very slow. The children approach, as mentioned in the question, parses the whole parent node, which is also unacceptable. Try them out with 100 times as many nodes and 1/100 as many tests. – Steinbitglis Jan 22 '10 at 16:46
  • Thanks for showing me the benchmark library by the way... I think it might be veeeery useful in the future :-) – Steinbitglis Jan 22 '10 at 16:48
  • I'm not sure this tells us anything really usable. Using `children` is doable, but in real life trying to find a node by index doesn't seem like it'd help much unless it's to ensure the file was created correctly; for normal uses it is a slow and painful approach that tosses away the value of using a parser. It'd be easier to read the XML as text and search each line instead. Take advantage of the ability to use XPath or CSS and the coding time and overall processing speed will go up exponentially over walking a big XML or HTML document node by node. – the Tin Man Feb 07 '20 at 20:38
1

Node#child is the fastest way to get the first child element.

However, if the node you're looking for is NOT the first, perhaps the 99th, then there is no faster way to select that node than to call children and index into it.

You are correct in stating that it's expensive to build a NodeSet for all children if you only want the first one.

One limiting factor is that libxml2 (the XML library underlying Nokogiri) stores a node's children as a linked list. So you'll need to traverse the list (O(n)) to select the desired child node.

It would be feasible to write a method to simply return the nth child, without instantiating a NodeSet or even Ruby objects for all the other children. My advice would be to open a feature request, or send an email to the Nokogiri mailing list.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Mike Dalessio
  • 1,352
  • 9
  • 11
0

An approach that neither uses XPath nor results in parsing the whole parent is to use both Node#child(), Node#next_sibling() and Node#element?().

Something like this...

def first(node)
    element = node.child
    while element
       if element.element?
           return element
       else
           element = element.next
       end
    end
    nil
end
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Steinbitglis
  • 2,482
  • 2
  • 27
  • 40