I want to read in a XML file and put an incrementing id in specific elements. Here is some test code I wrote to figure out how to do that:
import scala.xml._
import scala.xml.transform._
val testXML =
<document>
<authors>
<author>
<first-name>Firstname</first-name>
<last-name>Lastname</last-name>
</author>
</authors>
</document>
def addIDs(node : Node) : Node = {
object addIDs extends RewriteRule {
var authorID = -1
var emailID = -1
var instID = -1
override def transform(elem: Node): Seq[Node] =
{
elem match {
case Elem(prefix, "author", attribs, scope, _*) =>
//println("element author: " + elem.text)
if ((elem \ "@id").isEmpty) {
println("element id is empty:" + elem\"@id")
authorID += 1
println("authorID is " + authorID)
elem.asInstanceOf[Elem] % Attribute(None, "id", Text(authorID.toString), Null)
} else {
elem
}
case Elem(prefix, "email", attribs, scope, _*) =>
println("EMAIL")
elem.asInstanceOf[Elem] % Attribute(None, "id", Text(authorID.toString), Null)
case Elem(prefix, "institution", attribs, scope, _*) =>
println("INST")
elem.asInstanceOf[Elem] % Attribute(None, "id", Text(instID.toString), Null)
case other =>
other
}
}
}
object transform extends RuleTransformer(addIDs)
transform(node)
}
val newXML = addIDs(testXML)
This code is functional - but, the ids don't come out as expected:
element id is empty:
authorID is 0
element id is empty:
authorID is 1
element id is empty:
authorID is 2
element id is empty:
authorID is 3
element id is empty:
authorID is 4
element id is empty:
authorID is 5
element id is empty:
authorID is 6
element id is empty:
authorID is 7
newXML:scala.xml.Node=<document>
<authors>
<author id="7">
<first-name>Firstname</first-name>
<last-name>Lastname</last-name>
</author>
</authors>
</document>
it looks like the transformer hits each node multiple times, incrementing the id and then finally stops when the id is up to 7. Why is it touching the node so many times before finally finishing with it? Is there something I could be doing differently to tell it to finish with that node?
I thought maybe it was traversing over the newly modified node, hence my check for the element containing an attribute named 'id'. But that doesn't seem to work. Maybe it's a bad idea to do this in the first place?
Thanks for any help with this.