Java, xml, XSLT: Prevent DTD-Validation

Question

I use the Java (6) XML-Api to apply a xslt transformation on a html-document from the web. This document is wellformed xhtml and so contains a valid DTD-Spec (<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">). Now a problem occurs: Uppon transformation the XSLT-Processor tries to download the DTD and the w3-server denies this by a HTTP 503 error (due to Bandwith Limitation by w3).

How can I prevent the XSLT-Processor from downloading the dtd? I dont need my input-document validated.

Source is:

import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

--

   String xslt = "<?xml version=\"1.0\"?>"+
   "<xsl:stylesheet version=\"1.0\" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">"+
   "    <xsl:output method=\"text\" />"+          
   "    <xsl:template match=\"//html/body//div[@id='bodyContent']/p[1]\"> "+
   "        <xsl:value-of select=\".\" />"+
   "     </xsl:template>"+
   "     <xsl:template match=\"text()\" />"+
   "</xsl:stylesheet>";

   try {
   Source xmlSource = new StreamSource("http://de.wikipedia.org/wiki/Right_Livelihood_Award");
   Source xsltSource = new StreamSource(new StringReader(xslt));
   TransformerFactory ft = TransformerFactory.newInstance();

   Transformer trans = ft.newTransformer(xsltSource);

   trans.transform(xmlSource, new StreamResult(System.out));
   }
   catch (Exception e) {
     e.printStackTrace();
   }

I read the following quesitons here on SO, but they all use another XML-Api:

"DTD download error while parsing XHTML document in XOM"

Thanks!

score 5 · Accepted Answer · edited Aug 09 '13 at 14:41

I recently had this issue while unmarshalling XML using JAXB. The answer was to create a SAXSource from an XmlReader and InputSource, then pass that to the JAXB UnMarshaller's unmarshal() method. To avoid loading the external DTD, I set a custom EntityResolver on the XmlReader.

SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLReader xmlr = sp.getXMLReader();
xmlr.setEntityResolver(new EntityResolver() {
    public InputSource resolveEntity(String pid, String sid) throws SAXException {
        if (sid.equals("your remote dtd url here"))
            return new InputSource(new StringReader("actual contents of remote dtd"));
        throw new SAXException("unable to resolve remote entity, sid = " + sid);
    } } );
SAXSource ss = new SAXSource(xmlr, myInputSource);

As written, this custom entity resolver will throw an exception if it's ever asked to resolve an entity OTHER than the one you want it to resolve. If you just want it to go ahead and load the remote entity, remove the "throws" line.

Just in case somebody has the same problems: This leads into the right direction (thats why I accepted the answer). If you don't want to return the DTD, you can also return an empty one. — theomega, Aug 04 '11 at 19:18
Please fix capitalization: 'XmlReader' should be 'XMLReader' — wau, Jul 19 '13 at 16:46

score 4 · Answer 2 · edited Feb 02 '22 at 07:46

Try setting a feature in your DocumentBuilderFactory:

URL url = new URL(urlString);
InputStream is = url.openStream();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder db;
db = dbf.newDocumentBuilder();
Document result = db.parse(is);

Right now I'm experiencing the same problems inside XSLT(2) when calling the document function to analyse external XHTML-pages.

score 2 · Answer 3 · answered Nov 19 '13 at 12:55

The previous answers led me to a solution but is wasn't obvious for me so here is a complete one:

private void convert(InputStream xsltInputStream, InputStream srcInputStream, OutputStream destOutputStream) throws SAXException, ParserConfigurationException,
        TransformerFactoryConfigurationError, TransformerException, IOException {
    //create a parser with a fake entity resolver to disable DTD download and validation
    XMLReader xmlReader = SAXParserFactory.newInstance().newSAXParser().getXMLReader();
    xmlReader.setEntityResolver(new EntityResolver() {
        public InputSource resolveEntity(String pid, String sid) throws SAXException {
            return new InputSource(new ByteArrayInputStream(new byte[] {}));
        }
    });
    //create the transformer
    Source xsltSource = new StreamSource(xsltInputStream);
    Transformer transformer = TransformerFactory.newInstance().newTransformer(xsltSource);
    //create the source for the XML document which uses the reader with fake entity resolver
    Source xmlSource = new SAXSource(xmlReader, new InputSource(srcInputStream));
    transformer.transform(xmlSource, new StreamResult(destOutputStream));
}

score -1 · Answer 4 · answered Mar 21 '11 at 04:03

-1

if you use

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

you can try disable the dtd validation with the fllowing code:

 dbf.setValidating(false);

answered Mar 21 '11 at 04:03

user668834

7
1

See the answer from Chris, it is exactly the same. – theomega Mar 21 '11 at 09:37

Chris · Answer 5 · 2009-10-15T16:45:04.927

-2

You need to be using javax.xml.parsers.DocumentBuilderFactory

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource src = new InputSource("http://de.wikipedia.org/wiki/Right_Livelihood_Award")
Document xmlDocument = builder.parse(src.getByteStream());
DOMSource source = new DOMSource(xmlDocument);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer(xsltSource);
transformer.transform(source, new StreamResult(System.out));

edited Oct 15 '09 at 16:45

answered Oct 15 '09 at 16:06

Chris

1,826
12
15

Thanks for the answer, but this code actually throws the same exception: `java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd` You have to change the `src.getByteStream()` to `src` in line 5 to get it working at all but there is still the same exception. – theomega Oct 16 '09 at 14:23
1

This changes nothing. You can parse document during transformation from stream source, of before transformation into DOMSource, but either way missing DTD exception will occur. So this "solution" solves nothing, and only misleads. – mvmn Feb 29 '12 at 13:45

Java, xml, XSLT: Prevent DTD-Validation

5 Answers5

Linked