1

I'm transforming an XHTML file to XML. I'm having issues that the XHML entity references are all getting swallowed in the process, i.e. entities such as © are disappearing in the output.

My code looks as follows:

<?xml version="1.0" encoding="utf-8"?>

<xsl:output method="xml" indent="yes"/>

  <xsl:template match="h1|h2|h3|h4|h5|h6|h7|h8|h9">
    <heading>
      <xsl:attribute name="name">
        <xsl:value-of select="name(.)" />
      </xsl:attribute>
      <xsl:attribute name="content">
        <xsl:value-of select="." />
      </xsl:attribute>

    </heading>
  </xsl:template>

    <xsl:template match="/html/body">
      <mapping>
        <xsl:apply-templates select="h1|h2|h3|h4|h5|h6|h7|h8|h9" />        
      </mapping>
    </xsl:template>
</xsl:stylesheet>

In the output any entity references disappear. I've tried adding the entity definitions into my XSL ... no luck.

Any suggestions ?

Anton

Marcin
  • 48,559
  • 18
  • 128
  • 201
anthun
  • 13
  • 2
  • 1
    Can you please provide a sample of your input and your desired output? – GeoGriffin Mar 08 '12 at 17:14
  • 1
    also, can you detail which processor you use and how you added the entity definitions? – BiAiB Mar 08 '12 at 17:16
  • Here's a similar question I asked a while back. http://stackoverflow.com/questions/5985615/preserving-entity-references-when-transforming-xml-with-xslt – Daniel Haley Mar 08 '12 at 17:18

2 Answers2

2

Entity references require a DTD. Be sure that the source document includes a DTD and that you do not disable entity resolution.

What you want to happen is that &copy; in the input becomes © in the output document. You do not want entity references in the output document.

Francis Avila
  • 31,233
  • 6
  • 58
  • 96
  • Entity references don't actually require a DTD, they just need to be declared. – Daniel Haley Mar 08 '12 at 23:19
  • Cheers mate, that does the trick. Unfortunately I'm still having a problem when adding the DOCTYPE definition to the source xhtml file in that the xslt processor (altovaxml) isn't able to resolve the external link to the DTD. I've been able to download the DTD and entitity files locally and reference the local copies. – anthun Mar 09 '12 at 08:04
  • That is on purpose. The w3c makes sure those DTD links don't work for requests from XML processors because they get slammed with them. Most XML processors have a facility called an XML catalog to automatically reference local copies of DTDs without changing the system identifiers. – Francis Avila Mar 09 '12 at 08:50
1

The entities are all expanded by the XML parser (conceptually) before XSLT starts, XSLt has no knowledge that the entity references were used, so can not preserve them. If you don't want the non-ascii characters to appear as characters, then the easiest solution is to specify an encoding such as

<xsl:output encoding="US-ASCII"/>

then any non ascii characters will be encoded as decimal or hex numeric references so copyright would come out as &#169; rather than © (assuming that your output is in fact serialised by XSLT).

David Carlisle
  • 5,582
  • 1
  • 19
  • 23