0

Background: I am using HtmlAgilityPack (.Net), so I'm forced to use XPath 1.0, which doesn't have a lower-case implementation.

I am trying to find all the nodes that have an attribute which contains foo as a whole word.
Examples:

  • "foo" match
  • "my foo" match
  • "foo bar" match
  • "Foo" match
  • "ifoo" no match
  • "food" no match

This is what I have (there is also no ends-with in XPath 1.0...):

//*[@*[starts-with(.,'foo ') or contains(.,' foo ') or .='foo' or substring(.,string-length(.) - 3)=' foo']]

According to this, I can use this horrific method to lower-case the search criteria:

translate(.,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')

Finally, my question: How do I use the translate function, while keeping the expression as short and as readable as possible?

(Bonus: How do I share this between different expressions?)

Community
  • 1
  • 1
seldary
  • 6,186
  • 4
  • 40
  • 55
  • What do you mean by "How do I share this between different expressions"? The rest is answered very well by Tomalak -- using a sentinel approach is a well-known technique in programming. – Dimitre Novatchev Jul 29 '12 at 14:34

2 Answers2

1

How about adding custom functions to XPath?

1

Just use (summarized version of Tomalak's answer):

//@*[contains(concat(' ', 
                     translate(normalize-space(), 'FOO', 'foo'), 
                     ' '), 
              ' foo '
              )
    ]

Warning:

Never insert into an XPath expression placeholder a string received by an unknown agent (end-user). This opens a gaping hole for XPath Injection attack.

The recomended practice is to have a compiled XPath expression and to pass the user-supplied string as parameter (or get them via a variable or function reference in the XPath expression) when the evaluation is done.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431