1

I have an xml file:

<root>
<!-- all other kinds of xml elements including possibly other h1 -->

<dl> some text
  <dt>
    other text
  </dt>
</dl>
<!-- all other kinds of xml elements including possibly other h1 -->

<h1>
  <a>starting here</a>
</h1>

<dl>foo
  <dt>
    bar
  </dt>
</dl>
<dl>foo
  <dt>
    bar
  </dt>
</dl>

<!-- Many elements but all of them are dl -->

<dl>foo
  <dt>
    bar
  </dt>
</dl>
<dl>foo
  <dt>
    bar
  </dt>
</dl>

<h1>
  <a>Ending here</a>
</h1>

<!-- all other kinds of xml elements including possibly other h1 -->
<dl>foo
  <dt>
    bar
  </dt>
</dl>
<!-- all other kinds of xml elements including possibly other h1 -->

</root>

Now I'd like to select the <dl> node (with its children) between the <h1> tags.

I have tried various combinations of following, following-sibling but have had no success.

Can anyone help me?

RedX
  • 14,749
  • 1
  • 53
  • 76

4 Answers4

0

You can use the following XPath 1.0 expression to find the <dl> node between the <h1> tags:

//dl[preceding-sibling::*[1] = preceding-sibling::h1[1] and following-sibling::*[1] = following-sibling::h1[1]]

This will find the <dl> tags where the immediately preceding sibling is a <h1> tag, and the immediately following sibling is also a <h1> tag.

Depending on the requirement, it could be simplified to:

//dl[preceding-sibling::h1 and following-sibling::h1]

Which would just find all <dl> tags that have a <h1> tag somewhere before or after it.

Note that your XML would need to include a root element for it to be valid, and you generally can't execute XPath expressions on invalid XML.

Keith Hall
  • 15,362
  • 3
  • 53
  • 71
0

Select all dl tags after h1 which have h1 ahead

//h1/following-sibling::dl[following-sibling::h1]
splash58
  • 26,043
  • 3
  • 22
  • 34
0

Both previous answers are right for the given example with two h1 tags. But would not work if the input is a little bit more complex e.g. with three or four h1 tags.

If so, you can follow this excellent answer.

Here's an adaption to get the dl tags between first and second h1 tag

 //h1[1]/following-sibling::dl
      [count(.|//h1[2]/preceding-sibling::dl) = count(//h1[2]/preceding-sibling::dl)  ]
Community
  • 1
  • 1
hr_117
  • 9,589
  • 1
  • 18
  • 23
  • hr_117, Re: "*Both previous answers are right *" -- Unfortunately, SO displays the answers in seemingly random order, and my answer, written 2 days after your answer happens to be displayed as "previous" to yours. There is no way you knew two days ago what I would answer today. Obviously, some edit is needed in order to fix this, isn't it? :) – Dimitre Novatchev Jun 12 '16 at 18:02
0

Because the sequence of adjacent dl elements must be immediately preceded and followed by h1 siblings, one expression that selects all such sequences of dl elements is:

/*/dl[preceding-sibling::*[not(self::dl)][1][self::h1]
    and
      following-sibling::*[not(self::dl)][1][self::h1]
      ]

You can find all h1 elements each of which immediately precedes such wanted adjacent sequence of dls with this expression:

/*/h1[following-sibling::*[1][self::dl]
    and
      following-sibling::*[not(self::dl)][1][self::h1]
     ]

You can find all h1 elements each of which immediately follows such wanted adjacent sequence of dls with this expression:

/*/h1[preceding-sibling::*[1][self::dl]
    and
      preceding-sibling::*[not(self::dl)][1][self::h1]
     ]

Finally, here is an XSLT-based verification

This transformation just evaluates the expressions for finding all wanted sequences of dls and copies them to the output.

It also evaluates the expressions that select the starting and ending h1s and also outputs the results.

Finally, for each pair of (starting h1, ending h1) it evaluates an XPath expression that selects all dls in this particular group and outputs the results.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

  <xsl:template match="/">
    <xsl:copy-of select="
    /*/dl[preceding-sibling::*[not(self::dl)][1][self::h1]
        and
          following-sibling::*[not(self::dl)][1][self::h1]
         ]"/>
========================
<xsl:variable name="vStartingH1s" select=
 "/*/h1[following-sibling::*[1][self::dl]
       and
        following-sibling::*[not(self::dl)][1][self::h1]
        ]"/>

    <xsl:copy-of select="$vStartingH1s"/>
========================
    <xsl:variable name="vEndingH1s" select=
    "/*/h1[preceding-sibling::*[1][self::dl]
          and
           preceding-sibling::*[not(self::dl)][1][self::h1]
           ]"/>
    <xsl:copy-of select="$vEndingH1s"/>
========================
    <xsl:for-each select="$vStartingH1s">
      <xsl:variable name="vPos" select="position()"/>
      <xsl:value-of select=
          "concat(&#xA;'========== Group ', $vPos, ' ==========&#xA;')"/>
      <xsl:copy-of select=
      "following-sibling::*
            [count(.| $vEndingH1s[position()=$vPos]/preceding-sibling::*)
            =
             count($vEndingH1s[position()=$vPos]/preceding-sibling::*)
            ]"/>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the following XML document (the provided one, expanded to contain two groups of wanted dls):

<root>
    <!-- all other kinds of xml elements including possibly other h1 -->
    <a/>
    <h1/>
    <dl> some text
        <dt>
        other text
        </dt>
    </dl>
    <!-- all other kinds of xml elements including possibly other h1 -->
    <b/>
    <h1/>
    <h1>
        <a>starting here</a>
    </h1>
    <dl>foo
        <dt>
        bar
        </dt>
    </dl>
    <dl>foo
        <dt>
        bar
        </dt>
    </dl>
    <!-- Many elements but all of them are dl -->
    <dl>foo
        <dt>
        bar
        </dt>
    </dl>
    <dl>foo
        <dt>
        bar
        </dt>
    </dl>
    <h1>
        <a>Ending here</a>
    </h1>
    <h1/>
    <c/>
    <h1/>
    <!-- all other kinds of xml elements including possibly other h1 -->
    <p/>
    <dl>foo
        <dt>
        bar
        </dt>
    </dl>
    <!-- all other kinds of xml elements including possibly other h1 -->
    <h1/>
    <d/>
    <h1>
        <a>starting here</a>
    </h1>
    <dl>foo
        <dt>
        bar
        </dt>
    </dl>
    <dl>foo
        <dt>
        bar
        </dt>
    </dl>
    <h1>
        <a>Ending here</a>
    </h1>
    <e/>
    <h1/>
<f/>

the wanted, correct results are produced:

<dl>foo
        <dt>
        bar
        </dt>

</dl>
<dl>foo
        <dt>
        bar
        </dt>

</dl>
<dl>foo
        <dt>
        bar
        </dt>

</dl>
<dl>foo
        <dt>
        bar
        </dt>

</dl>
<dl>foo
        <dt>
        bar
        </dt>

</dl>
<dl>foo
        <dt>
        bar
        </dt>

</dl>
========================
<h1>

   <a>starting here</a>

</h1>
<h1>

   <a>starting here</a>

</h1>
========================
    <h1>

   <a>Ending here</a>

</h1>
<h1>

   <a>Ending here</a>

</h1>
========================
    ========== Group 1 ==========
<dl>foo
        <dt>
        bar
        </dt>

</dl>
<dl>foo
        <dt>
        bar
        </dt>

</dl>
<dl>foo
        <dt>
        bar
        </dt>

</dl>
<dl>foo
        <dt>
        bar
        </dt>

</dl>========== Group 2 ==========
<dl>foo
        <dt>
        bar
        </dt>

</dl>
<dl>foo
        <dt>
        bar
        </dt>

</dl>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431