4

Is it possible with XPath to get a concatenated view of all of the children of a node? I am looking for something like the JQuery .html() method.

For example, if I have the following XML:

<h3 class="title">
    <span class="content">this</span>
    <span class="content"> is</span>
    <span class="content"> some</span>
    <span class="content"> text</span>
</h3>

I would like an XPath query on "h3[@class='title']" that would give me "this is some text".

That is the real question, but if more context/background is helpful, here it is: I am using XPath and I used this post to help me write some complex XSL. My source XML looks like this.

<h3 class="title">Title</h3>
<p>
    <span class="content">Some</span>
    <span class="content"> text</span>
    <span class="content"> for</span>
    <span class="content"> this</span>
    <span class="content"> section</span>
</p>
<p>
    <span class="content">Another</span>
    <span class="content"> paragraph</span>
</p>
<h3 class="title">
    <span class="content">Title</span>
    <span class="content"> 2</span>
    <span class="content"> is</span>
    <span class="content"> complex</span>
</h3>
<p>
    <span class="content">Here</span>
    <span class="content"> is</span>
    <span class="content"> some</span>
    <span class="content"> text</span>
</p>

My output XML considers each <h3> as well as all <p> tags until the next <h3>. I wrote the XSL as follows:

<xsl:template match="h3[@class='title']">
...
    <xsl:apply-templates select="following-sibling::p[
        generate-id(preceding-sibling::h3[1][@class='title'][text()=current()/text()])
        =
        generate-id(current())
    ]"/>
...
</xsl:template>

The problem is that I use the text() method to identify h3s that are the same. In the example above, the "Title 2 is complex" title's text() method returns whitespace. My thought was to use a method like JQuery's .html that would return me "Title 2 is complex".

Update: This might help clarify. After the transform, the desired output for the above would look something like this:

<section>
    <title>Title</title>
    <p>
        <content>Some</content>
        <content> text</content>
        <content> for</content>
        <content> this</content>
        <content> section</content>
    </p>
    <p>
        <content>Another</content>
        <content> paragraph</content>
    </p>
</section>
<section>
    <title>
        <content>Title</content>
        <content> 2</content>
        <content> is</content>
        <content> complex</content>
    </title>
    <p>
        <content>Here</content>
        <content> is</content>
        <content> some</content>
        <content> text</content>
    </p>
</section>
Community
  • 1
  • 1
Brian
  • 142
  • 2
  • 11
  • Complete stylesheet have been added in response to added complete desired output. –  Jun 09 '10 at 13:43
  • Here's a handy tool (for anyone else with similar task) http://codebeautify.org/Xpath-Tester – Rimian Sep 18 '15 at 11:21

1 Answers1

5
h3[@class='title']/span[@class='content']/text()

Like this?

h3[@class='title']/descendant::*/text()

Or this?

nuqqsa
  • 4,511
  • 1
  • 25
  • 30
  • Yes, this does seem to help, but it reverses my problem. Now it will pick up the text in "Title 2 is complex" but it will not pick up the text in "Title" where there are no spans. Is there XPath that will pick up both? The worst part is, there are currently only spans in there, but there really could be any tag. I am really hoping for a solution that will pull the text out of any tag. – Brian Jun 04 '10 at 14:17
  • A new solution: this will pick up all descendant's content (recursively!) – nuqqsa Jun 04 '10 at 14:25
  • Ah! It seems like the solution is your suggestion with a // operator in place of the span (which will select nodes in the document from the current node that match the selection no matter where they are). This seems to work. It seems like I can't put a code snippet in the comment, but the h3 section of the select clause now looks like this: h3[1][@class='title'][//text()=current()//text()]). – Brian Jun 04 '10 at 14:27
  • 1
    I tried the descendant solution in my XSL and it had the same effect as the /span.../text() option. It matches the complex title but not the simple one. It seems like //text() is the only one that matches both. – Brian Jun 04 '10 at 17:48
  • 1
    It is better to use `descendant-or-self` to retrieve the text of the element itself it the element has no children. – bman May 05 '16 at 06:15
  • The second one is true for concatenating all children's text. But what about the node's direct text itself?? – Mohsen Abasi Jul 19 '17 at 11:49