32

How can I get H1,H2,H3 contents in one single xpath expression?

I know I could do this.

//html/body/h1/text()
//html/body/h2/text()
//html/body/h3/text() 

and so on.

Aivan Monceller
  • 4,636
  • 10
  • 42
  • 69

1 Answers1

46

Use:

/html/body/*[self::h1 or self::h2 or self::h3]/text()

The following expression is incorrect:

//html/body/*[local-name() = "h1"  
           or local-name() = "h2"  
           or local-name() = "h3"]/text()  

because it may select text nodes that are children of unwanted:h1, different:h2, someWeirdNamespace:h3.

Another recommendation: Always avoid using // when the structure of the XML document is statically known. Using // most often results in significant inefficiencies because it causes the complete document (sub)tree roted in the context node to be traversed.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • 3
    On the performance question, your mileage may vary. Some products go to great lengths to optimise queries using //x. – Michael Kay Nov 04 '11 at 09:49
  • 1
    I want to get the text inside the p tag. The h tag can be h3 or h4 or h5

    Ingredients:

    Tomato Purée, Acidity Regulator (Citric Acid)

    How to get it using a single xpath.?? thanks in advance
    – Aswin Sathyan Jun 28 '16 at 07:17
  • @AswinSathyan, Just ask a separate question at SO. – Dimitre Novatchev Jun 28 '16 at 14:12
  • 2
    great! I needed a descendant selector under body to get all headings: `/html/body//*[self::h1 or self::h2 or self::h3]/text()` – ptim Feb 21 '18 at 00:07