-1

I’m wondering how I can return only a portion of XML data up to the next h2 tag. So I have xml node called SUMMARY as shown below with some example text:

 <SUMMARY>
  <h2>heading One</h2><p>paragraph text under heading one</p>
  <h2>Heading Two</h2><p>paragraph text under heading two</p>
  <h2>Heading Three</h2><p>paragraph text under heading three</p>       
 </SUMMARY>

I currently am using this but it's not quite working

         <xsl:choose>
          <xsl:when test="contains(SUMMARY, ':')">
            <xsl:value-of select="substring-before(SUMMARY, '.')"/>.
            </xsl:when>
          </xsl:choose>

Any help would be appreciated

Thomas W
  • 14,757
  • 6
  • 48
  • 67
NickP
  • 357
  • 1
  • 7
  • 18
  • Your question is not clear. What exactly does "*only a portion of XML data up to the next h2 tag*" mean? -- Do note that there are no `h2` tags in the given XML; everything within the CDATA section is just a single string, with no markup. – michael.hor257k Aug 03 '17 at 01:46
  • XML node "SUMMARY" has a bunch of data separated by h2 headings with paragraphs of text below each. There are a total of 6 headings with text underneath. I just want to return the second. Is that not clear? – NickP Aug 03 '17 at 02:04
  • It would be a lot clearer if you posted the expected result. And also tell us which XSLT processor will you be using. – michael.hor257k Aug 03 '17 at 02:23
  • Heading Two:

    paragraph text under heading two

    is the expected result. I'm using a Content management system so I'm not sure which processor it uses - I'm using an xml datasource which connects to a xsl file.
    – NickP Aug 03 '17 at 02:30
  • Here's how to find out: https://stackoverflow.com/questions/25244370/how-can-i-check-which-xslt-processor-is-being-used-in-solr/25245033#25245033 – michael.hor257k Aug 03 '17 at 02:54
  • libxslt - 1.0 was returned – NickP Aug 03 '17 at 03:38

2 Answers2

2

In XSLT 1.0, for retrieving content belonging to a specific milestone (in this case a heading element), I suggest one of the two following methods. Replace the <xsl:copy-of> part in these examples with whatever you want to do with the retrieved content.

1. Using keys:

<xsl:key name="content-by-heading" match="SUMMARY/p" 
  use="generate-id(preceding-sibling::*[self::h1|self::h2|self::h3|self::h4|self::h5|self::h6][1])"/>

<xsl:template match="h2">
  <xsl:copy-of select="key('content-by-heading', generate-id())"/>
</xsl:template>

2. Iterating over the following siblings of the heading:

<xsl:template match="h2">
  <xsl:apply-templates select="following-sibling::*[1]" mode="get-heading-content"/>
</xsl:template>

<xsl:template match="*" mode="get-heading-content">
  <xsl:copy-of select="."/>
  <xsl:apply-templates select="following-sibling::*[1]" mode="get-heading-content"/>
</xsl:template>

<!-- Stop iteration when we're at the next heading -->
<xsl:template match="h1|h2|h3|h4|h5|h6" mode="get-heading-content"/>
Thomas W
  • 14,757
  • 6
  • 48
  • 67
  • Perfect! Thank you Thomas - haven't used xsl for a long time hence I needed help with more complex issues like this one. – NickP Aug 03 '17 at 06:36
0

There are a total of 6 headings with text underneath. I just want to return the second. ... <h2>Heading Two:</h2><p>paragraph text under heading two</p> is the expected result.

To return a copy of the second heading and the second para, you can do simply:

<xsl:template match="SUMMARY">
    <xsl:copy-of select="h2[2] | p[2]"/>
</xsl:template>

IMPORTANT:
I notice you have removed the CDATA markup from the given example. I find this very confusing.

If the CDATA section is not there, then this is an extremely trivial problem (as can be seen from the solution above). In fact, it this were your question to begin with, I wouldn't have bothered to answer it at all.

OTOH, if the CDATA markup is there after all, and you have chosen to remove it in order to "simplify" your question, then the above solution won't work and the task becomes much more difficult.

michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • The CMS allows me to enter a comma-separated list of tag names that you would like to enclose in [CDATA]. But I removed CDATA as a correction. – NickP Aug 03 '17 at 06:21