I have a very large XML file (300mb) of the following format:
<data>
<point>
<id><![CDATA[1371308]]></id>
<time><![CDATA[15:36]]></time>
</point>
<point>
<id><![CDATA[1371308]]></id>
<time><![CDATA[15:36]]></time>
</point>
<point>
<id><![CDATA[1371308]]></id>
<time><![CDATA[15:36]]></time>
</point>
</data>
Now I need to read it and iterate through the point
nodes doing something for each. Currently I'm doing it with Nokogiri like this:
require 'nokogiri'
xmlfeed = Nokogiri::XML(open("large_file.xml"))
xmlfeed.xpath("./data/point").each do |item|
save_id(item.xpath("./id").text)
end
However that's not very efficient, since it parses everything whole hug, and hence creating a huge memory footprint (several GB).
Is there a way to do this in chunks instead? Might be called streaming if I'm not mistaken?
EDIT
The suggested answer using nokogiris sax parser might be okay, but it gets very messy when there is several nodes within each point
that I need to extract content from and process differently. Instead of returning a huge array of entries for later processing, I would much rather prefer if I could access one point
at a time, process it, and then move on to the next "forgetting" the previous.