Scala - modifying nested elements in xml

Question

I'm learning scala, and I'm looking to update a nested node in some xml. I've got something working but i'm wondering if its the most elegant way.

I have some xml:

val InputXml : Node =
<root>
    <subnode>
        <version>1</version>
    </subnode>
    <contents>
        <version>1</version>
    </contents>
</root>

And i want to update the version node in subnode, but not the one in contents.

Here is my function:

def updateVersion( node : Node ) : Node = 
 {
   def updateElements( seq : Seq[Node]) : Seq[Node] = 
   {
        var subElements = for( subNode <- seq ) yield
        {
            updateVersion( subNode )
        }   
        subElements
   }

   node match
   {
     case <root>{ ch @ _* }</root> =>
     {
        <root>{ updateElements( ch ) }</root>
     }
     case <subnode>{ ch @ _* }</subnode> =>
     {
         <subnode>{ updateElements( ch ) }</subnode> 
     }
     case <version>{ contents }</version> =>
     {
        <version>2</version>
     }
     case other @ _ => 
     {
         other
     }
   }
 }

Is there a more succint way of writing this function?

very weird and lengthy formatting style... suggest using something more resemblant of the standard coding style; your IDE/editor should have built-in formatting, you can start with that. — Erik Kaplun, Jun 23 '14 at 18:49

Daniel C. Sobral · Answer 1 · 2009-08-20T20:17:11.613

All this time, and no one actually gave the most appropriate answer! Now that I have learned of it, though, here's my new take on it:

import scala.xml._
import scala.xml.transform._

object t1 extends RewriteRule {
  override def transform(n: Node): Seq[Node] = n match {
    case Elem(prefix, "version", attribs, scope, _*)  =>
      Elem(prefix, "version", attribs, scope, Text("2"))
    case other => other
  }
}

object rt1 extends RuleTransformer(t1)

object t2 extends RewriteRule {
  override def transform(n: Node): Seq[Node] = n match {
    case sn @ Elem(_, "subnode", _, _, _*) => rt1(sn)
    case other => other
  }
}

object rt2 extends RuleTransformer(t2)

rt2(InputXml)

Now, for a few explanations. The class RewriteRule is abstract. It defines two methods, both called transform. One of them takes a single Node, the other a Sequence of Node. It's an abstract class, so we can't instantiate it directly. By adding a definition, in this case override one of the transformmethods, we are creating an anonymous subclass of it. Each RewriteRule needs concern itself with a single task, though it can do many.

Next, class RuleTransformer takes as parameters a variable number of RewriteRule. It's transform method takes a Node and return a Sequence of Node, by applying each and every RewriteRule used to instantiate it.

Both classes derive from BasicTransformer, which defines a few methods with which one need not concern oneself at a higher level. It's apply method calls transform, though, so both RuleTransformer and RewriteRule can use the syntactic sugar associated with it. In the example, the former does and the later does not.

Here we use two levels of RuleTransformer, as the first applies a filter to higher level nodes, and the second apply the change to whatever passes the filter.

The extractor Elem is also used, so that there is no need to concern oneself with details such as namespace or whether there are attributes or not. Not that the content of the element version is completely discarded and replaced with 2. It can be matched against too, if needed.

Note also that the last parameter of the extractor is _*, and not _. That means these elements can have multiple children. If you forget the *, the match may fail. In the example, the match would not fail if there were no whitespaces. Because whitespaces are translated into Text elements, a single whitespace under subnode would case the match to fail.

This code is bigger than the other suggestions presented, but it has the advantage of having much less knowledge of the structure of the XML than the others. It changes any element called version that is below -- no matter how many levels -- an element called subnode, no matter namespaces, attributes, etc.

Furthermore... well, if you have many transformations to do, recursive pattern matching becomes quickly unyielding. Using RewriteRule and RuleTransformer, you can effectively replace xslt files with Scala code.

Brilliant, just what I need. Incidentally, the same idea can be applied to case classes and collections with Kiama: http://stackoverflow.com/questions/3900307/cleaner-way-to-update-nested-structures/3900498#3900498 — retronym, Oct 11 '10 at 17:05
Be very careful using this code as scala.xml transformers have exponential complexity on the nesting level. This is a long standing issue that seems unlikely to get fixed, https://issues.scala-lang.org/browse/SI-3689 — Caoilte, Mar 12 '15 at 17:12
@daniel It's some 7 years down the track, would you still do it this way? I ask as if you had an XML doc which needed a few dozen transformations on an xml file at several nodes deep, would not you end up writing a lot of code to do some very basic transformations? — neurozen, Jul 04 '16 at 12:52
@neurozen I've been fortunate to _not_ having to write code that does XML transformations -- or, in fact, use much XML at all. If I had to, however, I'd look at one of the alternative XML libraries for Scala, since they are much heavier duty than what _used to come_ with Scala's standard library (but has since been split apart). — Daniel C. Sobral, Jul 05 '16 at 21:50
@Caoilte the exponential runtime is fixed in scala-xml v1.0.6 which was just released — EdgeCaseBerg, Sep 21 '16 at 10:49
I was looking at ways to manipulate XML documents and read that `RuleTransformer` has complexity of `2^n`. read http://blog.wix.engineering/2015/06/28/a-tale-of-two-xml-transformations-2/. Is it still a choice for manipulating XML? — daydreamer, Dec 01 '16 at 21:14
@daydreamer I think it was fixed, but I haven't paid attention to it in a long while now. — Daniel C. Sobral, Dec 07 '16 at 23:36

score 13 · Answer 2 · answered Jan 06 '11 at 05:38

13

You can use Lift's CSS Selector Transforms and write:

"subnode" #> ("version *" #> 2)

See http://stable.simply.liftweb.net/#sec:CSS-Selector-Transforms

answered Jan 06 '11 at 05:38

David Pollak

7,015
2
26
26

A complete example: `("subnode" #>
something-else
)(InputXml)` – KajMagnus Aug 25 '11 at 12:22
1

In more recent versions `apply` would have to be written out: "subnode" #>
something-else
apply inputXml – nafg Feb 18 '13 at 05:57
Lift's CSS3 support seems to be rather restricted though, for example the child selector in `div > div` is not supported; also, none of the pseudo functions such as `:nth/first/last-child` seem to work. – Erik Kaplun Jun 26 '14 at 19:00

GClaramunt · Accepted Answer · 2009-06-10T20:29:26.610

I think the original logic is good. This is the same code with (shall I dare to say?) a more Scala-ish flavor:

def updateVersion( node : Node ) : Node = {
   def updateElements( seq : Seq[Node]) : Seq[Node] = 
     for( subNode <- seq ) yield updateVersion( subNode )  

   node match {
     case <root>{ ch @ _* }</root> => <root>{ updateElements( ch ) }</root>
     case <subnode>{ ch @ _* }</subnode> => <subnode>{ updateElements( ch ) }</subnode>
     case <version>{ contents }</version> => <version>2</version>
     case other @ _ => other
   }
 }

It looks more compact (but is actually the same :) )

I got rid of all the unnecessary brackets
If a bracket is needed, it starts in the same line
updateElements just defines a var and returns it, so I got rid of that and returned the result directly

if you want, you can get rid of the updateElements too. You want to apply the updateVersion to all the elements of the sequence. That's the map method. With that, you can rewrite the line

case <subnode>{ ch @ _* }</subnode> => <subnode>{ updateElements( ch ) }</subnode>

with

case <subnode>{ ch @ _* }</subnode> => <subnode>{ ch.map(updateVersion (_)) }</subnode>

As update version takes only 1 parameter I'm 99% sure you can omit it and write:

case <subnode>{ ch @ _* }</subnode> => <subnode>{ ch.map(updateVersion) }</subnode>

And end with:

def updateVersion( node : Node ) : Node = node match {
         case <root>{ ch @ _* }</root> => <root>{ ch.map(updateVersion )}</root>
         case <subnode>{ ch @ _* }</subnode> => <subnode>{ ch.map(updateVersion ) }</subnode>
         case <version>{ contents }</version> => <version>2</version>
         case other @ _ => other
       }

What do you think?

Heh. I hadn't seen your remark on Elem elsewhere. It might be a bit more verbose, but I think it's a hell of more elegant than the alternative, if you conside that the XML may well be much more complex than that. — Daniel C. Sobral, Jul 06 '09 at 22:38

Daniel C. Sobral · Answer 4 · 2009-08-20T14:13:02.520

I have since learned more and presented what I deem to be a superior solution in another answer. I have also fixed this one, as I noticed I was failing to account for the subnode restriction.

Thanks for the question! I just learned some cool stuff when dealing with XML. Here is what you want:

def updateVersion(node: Node): Node = {
  def updateNodes(ns: Seq[Node], mayChange: Boolean): Seq[Node] =
    for(subnode <- ns) yield subnode match {
      case <version>{ _ }</version> if mayChange => <version>2</version>
      case Elem(prefix, "subnode", attribs, scope, children @ _*) =>
        Elem(prefix, "subnode", attribs, scope, updateNodes(children, true) : _*)
      case Elem(prefix, label, attribs, scope, children @ _*) =>
        Elem(prefix, label, attribs, scope, updateNodes(children, mayChange) : _*)
      case other => other  // preserve text
    }

  updateNodes(node.theSeq, false)(0)
}

Now, explanation. First and last case statements should be obvious. The last one exists to catch those parts of an XML which are not elements. Or, in other words, text. Note in the first statement, though, the test against the flag to indicate whether version may be changed or not.

The second and third case statements will use a pattern matcher against the object Elem. This will break an element into all its component parts. The last parameter, "children @ _*", will match children to a list of anything. Or, more specifically, a Seq[Node]. Then we reconstruct the element, with the parts we extracted, but pass the Seq[Node] to updateNodes, doing the recursion step. If we are matching against the element subnode, then we change the flag mayChange to true, enabling the change of the version.

In the last line, we use node.theSeq to generate a Seq[Node] from Node, and (0) to get the first element of the Seq[Node] returned as result. Since updateNodes is essentially a map function (for ... yield is translated into map), we know the result will only have one element. We pass a false flag to ensure that no version will be changed unless a subnode element is an ancestor.

There is a slightly different way of doing it, that's more powerful but a bit more verbose and obscure:

def updateVersion(node: Node): Node = {
  def updateNodes(ns: Seq[Node], mayChange: Boolean): Seq[Node] =
    for(subnode <- ns) yield subnode match {
      case Elem(prefix, "version", attribs, scope, Text(_)) if mayChange => 
        Elem(prefix, "version", attribs, scope, Text("2"))
      case Elem(prefix, "subnode", attribs, scope, children @ _*) =>
        Elem(prefix, "subnode", attribs, scope, updateNodes(children, true) : _*)
      case Elem(prefix, label, attribs, scope, children @ _*) =>
        Elem(prefix, label, attribs, scope, updateNodes(children, mayChange) : _*)
      case other => other  // preserve text
    }

  updateNodes(node.theSeq, false)(0)
}

This version allows you to change any "version" tag, whatever it's prefix, attribs and scope.

score 3 · Answer 5 · answered Nov 03 '11 at 21:26

Scales Xml provides tools for "in place" edits. Of course its all immutable but here's the solution in Scales:

val subnodes = top(xml).\*("subnode"l).\*("version"l)
val folded = foldPositions( subnodes )( p => 
  Replace( p.tree ~> "2"))

The XPath like syntax is a Scales signature feature, the l after the string specifies it should have no namespace (local name only).

foldPositions iterates over the resulting elements and transforms them, joining the results back together.

wouldn't `top(xml) \* "subnode"l \* "version"l` be more readable? — Erik Kaplun, Jun 24 '14 at 13:35

score 1 · Answer 6 · answered Apr 29 '13 at 07:04

1

One approach would be lenses (e.g. scalaz's). See http://arosien.github.io/scalaz-base-talk-201208/#slide35 for a very clear presentation.

answered Apr 29 '13 at 07:04

nafg

2,424
27
25

Exactly my thought; just recently read about Lenses in Haskell and a bell rang immediately when reading this thread :) – Erik Kaplun Jun 24 '14 at 13:38
Although, could you, by any luck, provide a concrete (pseudo)code example of how you would apply lenses in this particular scenario? – Erik Kaplun Jun 24 '14 at 13:39

score 0 · Answer 7 · answered Feb 19 '21 at 19:19

If any poor souls still have to deal with XML in 2021 with Scala, here's one more library-based solution that I find particularly nice:

import scala.xml._
import jstengel.ezxml.core.SimpleWrapper.ElemWrapper
import jstengel.ezxml.core.XmlPath.\~

val InputXml: Elem = <root>
    <subnode>
        <version>1</version>
    </subnode>
    <contents>
        <version>1</version>
    </contents>
</root>

(InputXml \~ "subnode" \~ "version").transformTarget(_ => <version>2</version>)

Library in question: https://github.com/JulienSt/ezXML (thank you, Mr. JulienSt!)

Try it at https://scastie.scala-lang.org/EYJG5B91Q3KiVD7h9940JA

score -2 · Answer 8 · answered Jun 09 '09 at 22:17

I really don't know how this could be done elegantly. FWIW, I would go for a different approach: use a custom model class for the info you're handling, and have conversion to and from Xml for it. You're probably going to find it's a better way to handle the data, and it's even more succint.

However there is a nice way to do it with Xml directly, I'd like to see it.

Scala - modifying nested elements in xml

8 Answers8

Linked