0

I have a beginners question regarding the W3C specification (EBNF notation) of XPath expressions. The specification can be found at: http://www.w3.org/TR/xpath/. In particular I have a question about understanding the following expression:

(//attribute::name | //attribute::id)[starts-with(string(self::node()), "be") or starts-with(string(self::node()), "1")]

This appears to be a valid expression. I verified using http://www.freeformatter.com/xpath-tester.html with the following XML document:

<documentRoot>
<!-- Test data -->
<?xc value="2" ?>
<parent name="data" >
   <child id="1"  name="alpha" >Some Text</child>
   <child id="2"  name="beta" >
      <grandchild id="2.1"  name="beta-alpha" ></grandchild>
      <grandchild id="2.2"  name="beta-beta" ></grandchild>
   </child>
   <pet name="tigger"  type="cat" >
      <data>
         <birthday month="sept"  day="19" ></birthday>
         <food name="Acme Cat Food" ></food>
      </data>
   </pet>
   <pet name="Fido"  type="dog" >
      <description>
         Large dog!
      </description>
      <data>
         <birthday month="feb"  day="3" ></birthday>
         <food name="Acme Dog Food" ></food>
      </data>
   </pet>
   <rogue name="is this real?" >
      <data>
         Hates dogs!
      </data>
   </rogue>
   <child id="3"  name="gamma"  mark="yes" >
      <!-- A comment -->
      <description>
         Likes all animals - especially dogs!
      </description>
      <grandchild id="3.1"  name="gamma-alpha" >
         <![CDATA[ Some non-parsable character data ]]>
      </grandchild>
      <grandchild id="3.2"  name="gamma-beta" ></grandchild>
   </child>
</parent>
</documentRoot>

This gives me the following results:

Attribute='id="1"'
Attribute='name="beta"'
Attribute='name="beta-alpha"'
Attribute='name="beta-beta"'

It is not clear to me which sequence of EBNF productions would produce the above query.

Thanks for help.

user1362700
  • 155
  • 8

2 Answers2

3

Break-down:

(                        # group
  //attribute::name      #   the long form of //@name
  |                      #   union
  //attribute::id        #   the long form of //@id 
)                        # group end
[                        # predicate (think "where")
  starts-with(           #   returns true or false
    string(              #     returns a string
      self::node()       #        the long form of "."
    ),                   #     )
    "be"                 #     a string literal
  )                      #   )
  or                     #   logical operator
  starts-with(           #   ...idem
    string(              #
      self::node()       #
    ),                   #
    "1"                  #
  )                      #
]                        # end predicate

So the expression is a rather unnecessarily verbose version of

(//@name | //@id)[starts-with(., "be") or starts-with(., "1")]

selecting all attributes named "name" or "id" whose values begin with "be" or "1"

I'm not sure why you want the EBNF productions for this (homework, I presume), but understanding the expression itself might help you with it.

A few extra notes:

  • attribute:: designates the attribute axis.
  • Axes can precede any node test (the default axis always is child::).
  • The self:: axis is special, it contains only the node in question. The short form of self::node() is the dot (.). The implication is that if the node in question is a <foo> node, self::foo will match it, while self::bar will not.
  • // is the shorthand for /descendant-or-self::node()/
  • The string() function is redundant because starts-with() will convert its arguments to string implicitly anyway.
  • The union operator joins two node sets. Nodes that appear in both sets are not duplicated in the result.
  • Predicates are applied to each node in a node set, effectively filtering it.
Community
  • 1
  • 1
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • 1
    To be pedantic `//` is a shorthand for `/descendant-or-self::node()/`, not `/descendant::` (true, in many cases the distinction doesn't matter, but there are certain cases where it's critical, e.g. to make `//@foo` well-defined, or the difference between `//*[1]` and `/descendant::*[1]`) – Ian Roberts Jan 17 '14 at 11:18
  • @Ian Thanks, will correct this! — To be even more pedantic, `//` is the shorthand for `/descendant-or-self::` - nothing less, but nothing more, too. `node()` is not part if it. :) – Tomalak Jan 17 '14 at 11:34
  • 3
    No, `//` means precisely `/descendant-or-self::node()/` including the leading and trailing slashes ([XPath spec §2.5](http://www.w3.org/TR/xpath/#path-abbrev)). Otherwise you wouldn't be able to say things like `//@foo` as that would be `/descendant::attribute::foo` (you can't use two axes in the same location step). – Ian Roberts Jan 17 '14 at 11:38
  • Hm. Apparently there was an error in my logic until now (all these years!! \*howl\*). You are absolutely right, thanks for pointing this out to me. – Tomalak Jan 17 '14 at 11:41
2

I don't know how to correctly represent this but Expr >>> FilterExpr Predicate:

Expr > OrExpr > AndExpr > EqualityExpr > RelationalExpr > AdditiveExpr > MultiplicativeExpr > UnaryExpr > UnionExpr > PathExpr > FilterExpr > FilterExpr Predicate

gives you the 2 parts:

  • the filter (//attribute::name | //attribute::id)
  • and the predicate [starts-with(string(self::node()), "be") or starts-with(string(self::node()), "1")]

(//attribute::name | //attribute::id)

FilterExpr > PrimaryExpr > '(' Expr ')'
Expr > OrExpr > AndExpr > EqualityExpr > RelationalExpr > AdditiveExpr > MultiplicativeExpr > UnaryExpr > UnionExpr > UnionExpr '|' PathExpr

gives you //attribute::name and //attribute::id

//attribute::name and //attribute::id

PathExpr > LocationPath > AbsoluteLocationPath > AbbreviatedAbsoluteLocationPath > '//' RelativeLocationPath
RelativeLocationPath > Step > AxisSpecifier NodeTest Predicate*
    - AxisSpecifier > AxisName '::'
        - AxisName > 'attribute'
    - NodeTest > NameTest

NameTest being name and id

Predicate [starts-with(string(self::node()), "be") or starts-with(string(self::node()), "1")]

Predicate > '[' PredicateExpr ']' > Expr > OrExpr > OrExpr 'or' AndExpr
    - OrExpr > AndExpr
    - AndExpr > EqualityExpr > RelationalExpr > AdditiveExpr > MultiplicativeExpr > UnaryExpr > UnionExpr > PathExpr > FilterExpr > PrimaryExpr > FunctionCall > FunctionName '(' ( Argument ( ',' Argument )* )? ')'
        Argument > Expr

FunctionName being starts-with, first argument being another FunctionCall (string function), second argument being Literals (via PathExpr > FilterExpr > PrimaryExpr), "be" and "1".

Finally, self::node() comes from:

RelativeLocationPath > Step > AxisSpecifier NodeTest Predicate*
    - AxisSpecifier > AxisName '::'
        - AxisName > 'attribute'
    - NodeTest > NodeType '(' ')'

NodeType being 'node'

paul trmbrth
  • 20,518
  • 4
  • 53
  • 66
  • Great!! Now I get it. Thank you so much. I totally missed the FilterExpression rule. I was stuck with "Step = AxisSpecifier NodeTest Predicate". Very quick and good reply!! – user1362700 Jan 18 '14 at 06:05