I need to process XML documents that consist of a very large number of independent records, e.g.
<employees>
<employee>
<firstName>Kermit</firstName>
<lastName>Frog</lastName>
<role>Singer</role>
</employee>
<employee>
<firstName>Oscar</firstName>
<lastName>Grouch</lastName>
<role>Garbageman</role>
</employee>
...
</employees>
In some cases these are just big files, but in others they may come from a streaming source.
I can't just scala.xml.XmlLoader.load() it because I don't want to hold the whole document in memory (or wait for the input stream to close), when I only need to work with one record at a time. I know I can use XmlEventReader to stream the input as a sequence of XmlEvents. These are however much less convenient to work with than scala.xml.Node.
So I'd like to get a lazy Iterator[Node] out of this somehow, in order to operate on each individual record using the convenient Scala syntax, while keeping memory usage under control.
To do this myself, I could start with an XmlEventReader, build up a buffer of events between each matching start and end tag, and then construct a Node tree from that. But, is there an easier way that I've overlooked? Thanks for any insights!