0

We have a large xml file representing many different items in a set of data. None of this xml uses attributes, and each item in the xml maps directly to an ActiveRecord model.

The simplest solution, to me, seems to be to convert each item to a ruby hash. Xpath seems to only complicate the conversion of each piece of data to the ActiveRecord model.

However, nokogiri is insanely popular. I'm trying to understand if there's a benefit that I'm missing.

As far as I see it, the two benefits of nokogiri (or other Xpath-based parsers) are query-ability and handling of attributes.

1) Are there other benefits I'm missing here?

2) For our use case, which is parsing xml and converting those individual documents into ActiveRecord models, is there any reason to not just convert the document to a ruby hash for easier handling?

jtmarmon
  • 5,727
  • 7
  • 28
  • 45
  • Most XML parsers – including Nokogiri – provide also a SAX mode. SAX parsing might be best suited for your use case. Look for example [here](http://stackoverflow.com/questions/6828703/what-is-the-difference-between-sax-and-dom) for details. – Mario Zannone Aug 03 '15 at 23:12

1 Answers1

0

For huge XML you should use SAX parsing. That way your file won't be loaded into memory as a whole, instead your functions will receive callbacks every time parser hits node.

For fastest performance I recommend Ox.

Example from the documentation:

require 'stringio'
require 'ox'

class Sample < ::Ox::Sax
  def start_element(name); puts "start: #{name}";        end
  def end_element(name);   puts "end: #{name}";          end
  def attr(name, value);   puts "  #{name} => #{value}"; end
  def text(value);         puts "text #{value}";         end
end

io = StringIO.new(%{
<top name="sample">
  <middle name="second">
    <bottom name="third"/>
  </middle>
</top>
})

handler = Sample.new()
Ox.sax_parse(handler, io)
EugZol
  • 6,476
  • 22
  • 41