31

In the above xml sample I would like to select all the books that belong to class foo and not in class bar by using xpath.

<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
  <book class="foo">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book class="foo bar">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book class="foo bar">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
</bookstore>
topless
  • 8,069
  • 11
  • 57
  • 86
  • 2
    Good question, +1. See my answer for two different XPath 2.0 solutions of which the first might be the most efficient of them all especially with a non-optimizing XPath 2.0 engine. – Dimitre Novatchev Apr 17 '11 at 01:11

3 Answers3

39

By padding the @class value with leading and trailing spaces, you can test for the presence of " foo " and " bar " and not worry about whether it was first, middle, or last, and any false positive hits on "food" or "barren" @class values:

/bookstore/book[contains(concat(' ',@class,' '),' foo ')
        and not(contains(concat(' ',@class,' '),' bar '))]
Shimmy Weitzhandler
  • 101,809
  • 122
  • 424
  • 632
Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
  • 1
    What if `@class` contains tab or even new-line character instead of space. Here comes handy the `normalize-space` function (XPath 1.0) that strips the leading and trailing white-space from a string, replaces sequences of whitespace characters by a single space, e.g. `concat(' ',normalize-space(@class),' ')` – Steven Pribilinskiy Mar 01 '15 at 09:27
  • @Steven Pribilinskiy - That should not be necessary. Due to how attribute values are normalized by the XML parser, tabs and carriage returns will have already been normalized into a space. http://www.w3.org/TR/xml/#AVNormalize – Mads Hansen Mar 01 '15 at 15:48
11

Although I like Mads solution: Here is another approach for XPath 2.0:

/bookstore/book[
                 tokenize(@class," ")="foo" 
                 and not(tokenize(@class," ")="bar")
               ]

Please note that the following expressions are both true:

("foo","bar")="foo" -> true
("foo","bar")="bar" -> true
Dennis Münkle
  • 5,036
  • 1
  • 19
  • 18
4

XPath 2.0:

/*/*[for $s in concat(' ',@class,' ') 
            return 
               matches($s, ' foo ') 
             and 
              not(matches($s, ' bar '))
      ]

Here no tokenization is done and $s is calculated only once.

Or even:

/*/book[@class
          [every $t in tokenize(.,' ') satisfies $t ne 'bar']
          [some  $t in tokenize(.,' ') satisfies $t eq 'foo']
       ]
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431