2

I'm using an XSLT 2.0 program to process some MathML documents. In those MathMLs, there are entities like ⁡ and ⁢, that give me "entity not defined" errors. Is there a way I can process documents with these entities without loading the MathML schema? (Because Saxon-HE cannot use xsl:import-schema…)

And just to be clear, I don't need to use the entities in my XSLT; I need to process XMLs that have them.

There's an entity file for MathML like this:

<!ENTITY AElig            "&#x000C6;" ><!--LATIN CAPITAL LETTER AE -->
<!ENTITY AMP              "&#38;#38;" ><!--AMPERSAND -->
<!ENTITY Aacute           "&#x000C1;" ><!--LATIN CAPITAL LETTER A WITH ACUTE —>
...

Maybe I can somehow make use of that?

UPDATE: multiple people has mentioned that the input documents should have the correct DTD. So here's an minimal example:

The XSLT:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:m="http://www.w3.org/1998/Math/MathML">
  <xsl:output method="text"/>

  <xsl:template match="/">
    <xsl:text>aaa</xsl:text>
  </xsl:template>
</xsl:stylesheet>

The MathML with DTD declaration:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN"
    "http://www.w3.org/Math/DTD/mathml2/mathml2.dtd">
<math xmlns="http://www.w3.org/1998/Math/MathML">
  <mrow> 
    <mi> sin </mi> 
    <mo> &ApplyFunction; </mo> 
    <mi> x </mi> 
  </mrow> 
</math>

Now Saxon gives me this error:

I/O error reported by XML parser processing file:/path/to/mathml.xml: unknown protocol: classpath
MetroWind
  • 541
  • 4
  • 16
  • If you are getting those errors when reading the input XML, then it sounds like your XML is incomplete and you need the entities declared. Otherwise, the XML parser doesn't know what content to substitute for the entity reference. http://xmlwriter.net/xml_guide/entity_declaration.shtml – Mads Hansen Jul 12 '17 at 01:36
  • https://stackoverflow.com/a/9128457/14419 – Mads Hansen Jul 12 '17 at 01:46
  • @MadsHansen That's an interesting solution. But I'm afraid I cannot use it. First of all, for it to work, the included XML document cannot have a `` header, which I don't have control over. The MathML documents I'm processing are dynamically generated, and will have the header. Also the wrapper file needs to refer to the included file by name. If the included XML document are dynamically generated, I'll also need to dynamically generate the wrapper file, which I don't think XSLT can do… – MetroWind Jul 12 '17 at 03:24
  • Do the MathML files you are trying to process reference the MathML DTD/entity files? – Martin Honnen Jul 12 '17 at 08:52
  • If I understand http://saxonica.com/documentation/index.html#!sourcedocs/w3c-dtds correctly then Saxon is already configured to load the MathML entities from a local cache as long as the XML input files references them properly. So as with any XML trying to use entity references other than `<`, `>`, `&`, `"` and `'`, your input files need to have a DTD that declares the entities directly or includes the official entity files doing that. – Martin Honnen Jul 12 '17 at 08:58

2 Answers2

1

I've had success in the past by declaring the entities in the XSL file. For example:

<!DOCTYPE stylesheet [
<!ENTITY lsquo "<xsl:text disable-output-escaping='yes'>&amp;#x2018;</xsl:text>">
<!ENTITY rsquo "<xsl:text disable-output-escaping='yes'>&amp;#x2019;</xsl:text>">
<!ENTITY ldquo "<xsl:text disable-output-escaping='yes'>&amp;#x201C;</xsl:text>">
<!ENTITY rdquo "<xsl:text disable-output-escaping='yes'>&amp;#x201D;</xsl:text>">
]>

... added at the top of the file, just after the <?xml?> declaration and just before the <xsl:stylesheet> element. I suspect a similar approach would help in your case.

Eiríkr Útlendi
  • 1,160
  • 11
  • 23
  • 1
    This will allow you to use these entity references in the stylesheet. It won't help if you want to use them in the source document. Also, defining them to expand to xsl:text instructions isn't useful if you want to use them in attributes. – Michael Kay Jul 14 '17 at 10:31
1

Just to reinforce the other answers/comments, entity expansion is the responsibility of the XML parser and has nothing to do with the XSLT processor. For the XML to be well-formed, the entities must be declared, which means you need to have an (internal or external) DTD that references them: that is, the source document must have a suitable DOCTYPE declaration.

The only contribution Saxon will make is that it makes its own EntityResolver available to the XML parser. The term "EntityResolver" is a bit of a misnomer, because it doesn't actually expand entity references like &InvisibleTimes;; all it does is to locate external DTD files to satisfy the system IDs and public IDs that appear in your DOCTYPE declaration.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164