16

What expression would select all text nodes which are:

  • not blank
  • not inside a, or script or style?
ThinkingStiff
  • 64,767
  • 30
  • 146
  • 239
Majid Fouladpour
  • 29,356
  • 21
  • 76
  • 127

3 Answers3

17

Use:

//*[not(self::a or self::script or self::style)]/text()[normalize-space()]

Not only is this expression shorter than the one in the currently accepted answer, but it also may be much more efficient.

Do note that the expression doesnt use any (back/up)-ward axes at all.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • 1
    +1 Thanks. Tested with firebug and it does shorten the time it takes to get the nodes. – Majid Fouladpour Dec 10 '10 at 18:42
  • 1
    @Majid: You are welcome. If this is better than the currently accepted answer, you may consider accepting my answer. – Dimitre Novatchev Dec 10 '10 at 19:02
  • I would have, was it not rude to the other person who kindly answered my question. – Majid Fouladpour Dec 11 '10 at 02:16
  • sir what is `normalize-space()` doing in the nodetest ? – Arup Rakshit Sep 06 '13 at 04:17
  • TLDR: doesn't select non-blank nodes. The `normalize-space` function strips leading and trailing white-space from a string, replaces sequences of whitespace characters by a single space, and returns the resulting string. https://developer.mozilla.org/en-US/docs/Web/XPath/Functions/normalize-space So this returns the empty string in the case where the node only has whitespace, which means the nodetest fails (empty strings being falsey: http://stackoverflow.com/questions/346226/how-to-create-a-boolean-value) and the nodes containing only whitespace are omitted. – Dragon Nov 25 '15 at 12:44
16

This should do, assuming "not inside" means the text node is not supposed to be a descendant of an "a" or "script" or "style" element. If "not inside" only means not supposed to be a child then use parent::a and so on instead of ancestor::a.

//text()[normalize-space() and not(ancestor::a | ancestor::script | ancestor::style)]
mlissner
  • 17,359
  • 18
  • 106
  • 169
Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • Need one help from you - is there any difference between `/books/child::*` and `/books/child::node()` ? – Arup Rakshit Jul 29 '13 at 15:59
  • Is it possible to have the condition //text()[not(ancestor::a)] but allow some specific list of words? Like if the link is on the word "Home" then we keep it. Or is it totally absurd? – student Nov 30 '18 at 15:59
3

I used Dimitre Novatchev's answer, but then i stumbled upon the problem described by the topic starter:

not descendant of a, style or script

Dimitre's answer excludes style tag but includes its children. This version excludes also style, script, noscript tags and their descendants:

//div[@id='???']//*[not(ancestor-or-self::script or ancestor-or-self::noscript or ancestor-or-self::style)]/text()

Anyway, thanks to Dimitre Novatchev.

warvariuc
  • 57,116
  • 41
  • 173
  • 227
  • +1 Haven't tried it, but from your description, this is a more robust method. – Majid Fouladpour Jun 10 '11 at 09:34
  • @warvariuc: Why do you think a ` – Dimitre Novatchev Aug 04 '19 at 17:25