1

Remark: please consider XPath syntax dead here, thank you.

I have xml node (HTML actually), and I would like to get an attribute of it.

In C# (HTMLAgilityPack) I could get attribute object by name. For example having "a" node I could ask for "href" attribute.

In Scala there is "attribute" method within xml.Node, but this returns a sequence of.. nodes. An attribute is a node? How it is possible to have several attributes with the same name? I am completely puzzled.

Moreover there is xml.Attribute class but I don't see it used in xml.Node.

I have PiS book but XML chapter is very shallow.

The question

How should I understand asking for an attribute an getting collection of nodes?

IOW: what sense is in returning an option of collection of nodes instead of returning attribute?

  • option -- if there is no attribute, collection should be empty, it is doubling semantics
  • collection -- this implies there are multiple attribute possible, so I am curious in what scenario I get collection of size > 1
  • node -- attribute is pretty simply entity, why such overkill and suggesting that attribute can have tree structure
Community
  • 1
  • 1
greenoldman
  • 16,895
  • 26
  • 119
  • 185

2 Answers2

4

You just want to get the value of an attribute, yes? In which case that's pretty easy:

scala> val x = <foo this="xx" that="yy" />
x: scala.xml.Elem = <foo this="xx" that="yy"></foo>

scala> x.attribute("this")
res0: Option[Seq[scala.xml.Node]] = Some(xx)

scala> x.attribute("this").get.toString
res1: String = xx

I know that you said that you explicitly aren't interested in XPath syntax, but in this instance it really is rather neater:

scala> x \ "@this"
res2: scala.xml.NodeSeq = xx

Having said all of this, you should be aware that there are many problems with attribute handling in Scala's built-in XML handling. See, for example, this, this and this.

Community
  • 1
  • 1
Paul Butcher
  • 10,722
  • 3
  • 40
  • 44
  • Thank you, but I am not asking question how-to-do-it kind, but how to understand it. For me is like looking at int + int => String. See update. – greenoldman Nov 12 '11 at 11:12
  • 1
    Ah - sorry. So the first thing to realise is that the design of the built-in XML processing is bad. It's difficult to understand and difficult to use. Don't be surprised if things don't make sense - it's not that you're misunderstanding, it really is a bad design. Read the links in my answer for examples of how it's badly designed. But to try to be a bit more helpful - the library chooses to represent everything as a Node, even attributes, even though an attribute can't have children. – Paul Butcher Nov 12 '11 at 11:20
  • The links you provided are superb (meanwhile I checked AntiXML thanks to them, this is bad as well, no descendants method in node).It is good to know I didn't go crazy ;-D and it is actually this particular library. Thank you again. – greenoldman Nov 12 '11 at 11:31
0

I realise that Paul's follow up answer pretty much covers your question but I'd just like to add a few more points:

  1. I personally don't like the design of Scala XML, to the extent that I wrote an alternative library Scales Xml, but I wouldn't call it badly designed. Design elements of it are apparently also good enough to form the basis of Anti-Xml's approach (Elements owning their children, a concept of grouping nodes etc), but there are many quirks - attribute and text as containers being a large one.
  2. I've only recently committed descendant axis to Scales - its greedy nature works differently than descendant-or-self - as per the spec //para1 does not mean the same as the location path /descendant::para1
  3. I'm not sure you can attribute bad design to Anti-Xml either for its absence, its a young project (just over seven months old?) and they may simply not have gotten round to adding descendant yet.

Direct answer for the attribute question for Scales is:

val pre = Namespace("uri:test").prefixed("pre")

val elem = Elem("fred"l, emptyAttributes + 
        ("attr", "value") +
        Attribute(pre("attr"), "value"))

println("attributes are a map " + elem.attributes("attr"))

println("attributes are a set " + (
  elem.attributes + ("attr", "new value")))

val xpath = top(elem) \@ pre("attr")

xpath foreach{ap => println(ap.name)}

giving

[info] attributes are a map Some(Attribute({}attr,value))
[info] attributes are a set ListSet(Attribute({}attr,new value), Attribute({uri:test}attr,value))
[info] {uri:test}attr

The XPath syntax must return a collection as it could be any number of paths that reached a matching attribute. Element Attributes themselves are QName matched "attr" meaning no namespace and localName of attr. For additional sanity an attribute QName is:

type AttributeQName = EitherLike[PrefixedQName, NoNamespaceQName]

The compiler makes sure no local name only QNames creep in.

As an aside, whilst I understand why the Scala XML XPath like syntax is probably uninteresting, you should have a look at Scales for XPath based querying.

There is both XPath 1.0 string based querying (not yet pushed into a non snapshot version) and an internal dsl that lets the compiler / ide help you out (plus the bonus of being far quicker and working with scala code directly).

Chris
  • 1,240
  • 7
  • 8