0

I want to find the xpath to a tag which is inside a CDATA. Below the xml fragment.

<books>
 <book>
  <title></title>
  <content><![CDATA[<p>Hi hello Hw r u?</p><p>We are fine</p><p>Hi babeeee!!!!</p>]]>    </content>
 </book>
</books>

I want to get the data which is inside the first <p> tag inside <content>. Can anybody please give the correct xpath to it?

Phil
  • 157,677
  • 23
  • 242
  • 245
Ammu
  • 5,067
  • 9
  • 34
  • 34
  • http://stackoverflow.com/questions/568315/how-do-i-retrieve-element-text-inside-cdata-markup-via-xpath – Ray Toal Aug 09 '11 at 04:55
  • 1
    Pretty sure you can't do that. CDATA is simply character data and does not represent any further document elements. – Phil Aug 09 '11 at 05:06

2 Answers2

4

CDATA contains arbitrary character data. In contradiction to PCDATA (acronym of parsed character data) it is not parsed, so there is no xpath to "elements" inside of it.

Leif
  • 2,143
  • 2
  • 15
  • 26
3

As Leif said, the content in the CDATA section is not parsed, so it's just text, even though it looks like markup. You'd have to parse it. Which you could do using Saxon (9.1 or later commercial editions) and saxon:parse. You'd then find it's not well formed, so you'd probably have to resort to a parser such as TagSoup to parse it.

You could also treat it as a string:

<xsl:stylesheet version="1.0"
  xmlns:saxon="http://saxon.sf.net/"
  exclude-result-prefixes="saxon"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <Root>
      <!--xsl:value-of select="saxon:parse(/books/book/content)"/-->
      <xsl:for-each select="books/book/content">
        <xsl:value-of select="
          substring-before(
          substring-after( . , '&gt;' ), '&lt;' ) "/>
      </xsl:for-each>
    </Root>
  </xsl:template>
</xsl:stylesheet>
Lumi
  • 14,775
  • 8
  • 59
  • 92