90

When I parse my xml file (variable f) in this method, I get an error

C:\Documents and Settings\joe\Desktop\aicpcudev\OnlineModule\map.dtd (The system cannot find the path specified)

I know I do not have the dtd, nor do I need it. How can I parse this File object into a Document object while ignoring DTD reference errors?

private static Document getDoc(File f, String docId) throws Exception{
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document doc = db.parse(f);


    return doc;
}
sblundy
  • 60,628
  • 22
  • 121
  • 123
joe
  • 16,988
  • 36
  • 94
  • 131

7 Answers7

149

Try setting features on the DocumentBuilderFactory:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

dbf.setValidating(false);
dbf.setNamespaceAware(true);
dbf.setFeature("http://xml.org/sax/features/namespaces", false);
dbf.setFeature("http://xml.org/sax/features/validation", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

DocumentBuilder db = dbf.newDocumentBuilder();
...

Ultimately, I think the options are specific to the parser implementation. Here is some documentation for Xerces2 if that helps.

jt.
  • 7,625
  • 4
  • 27
  • 24
  • 24
    the last one (`load-external-dtd`) did the trick for me - thanks. – Amarghosh Dec 02 '09 at 09:19
  • 1
    While trying this, I got a _DOMException: NAMESPACE_ERR: An attempt is made to create or change an object in a way which is incorrect with regard to namespaces._. I fixed this with `dbf.setNamespaceAware(true);` – Tim Van Laer Sep 05 '13 at 08:03
  • Just to let you know, the last feature setting (as stated by @Amarghosh) works great with a SAXParserFactory. – Alexis Leclerc Jan 20 '14 at 15:55
  • 2
    For me the `load-external-dtd` setting was enough. – chris Dec 14 '15 at 08:54
  • Using all the above features also makes the code to fail. Just using the last two feature(nonvalidating) makes my code work. – Purus Jul 12 '16 at 07:11
  • Per the most upvoted comment, `load-external-dtd` was the trick. Thanks! I was surprised that a feature with `http://apache.org/...` in the name fixes Java DTD parsing errors, but happy to have a fix that doesn't involve exceptions or XML modification. In my case, this fixed some intermittent failures when parsing Apple PLIST files and when either `apple.com` was unreachable or when `apple.com` couldn't respond to the HTTP request. Much obliged. :) – tresf May 07 '21 at 17:26
60

A similar approach to the one suggested by @anjanb

    builder.setEntityResolver(new EntityResolver() {
        @Override
        public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException {
            if (systemId.contains("foo.dtd")) {
                return new InputSource(new StringReader(""));
            } else {
                return null;
            }
        }
    });

I found that simply returning an empty InputSource worked just as well?

Community
  • 1
  • 1
toolkit
  • 49,809
  • 17
  • 109
  • 135
  • 5
    Setting the features on DocumentBuilderFactory worked for me. The solution in this post did not work. – Kai Mechel May 25 '11 at 14:22
  • 4
    This also worked perfectly for me, even though I thought I didn't use SAX – devnull69 Mar 13 '13 at 14:44
  • Sadly this didn't work for me. I still got the error. @jt did it for me though. – Nils-o-mat Jul 26 '16 at 09:10
  • Thanks for the solution, this is the approach recommended by _org.xml_ I think. It looks like there is a lot of material on this topic. see http://xerces.apache.org/xml-commons/components/resolver/resolver-article.html, or https://en.wikipedia.org/wiki/XML_Catalog and the javadoc http://www.saxproject.org/apidoc/org/xml/sax/EntityResolver.html and http://www.saxproject.org/apidoc/org/xml/sax/ext/EntityResolver2.html – aliopi Aug 03 '17 at 06:04
6

I found an issue where the DTD file was in the jar file along with the XML. I solved the issue based on the examples here, as follows: -

DocumentBuilder db = dbf.newDocumentBuilder();
db.setEntityResolver(new EntityResolver() {
    public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
        if (systemId.contains("doc.dtd")) {
             InputStream dtdStream = MyClass.class
                     .getResourceAsStream("/my/package/doc.dtd");
             return new InputSource(dtdStream);
         } else {
             return null;
         }
      }
});
Ahmed Ashour
  • 5,179
  • 10
  • 35
  • 56
Peter J
  • 61
  • 1
  • 1
5

Source XML (With DTD)

<!DOCTYPE MYSERVICE SYSTEM "./MYSERVICE.DTD">
<MYACCSERVICE>
   <REQ_PAYLOAD>
      <ACCOUNT>1234567890</ACCOUNT>
      <BRANCH>001</BRANCH>
      <CURRENCY>USD</CURRENCY>
      <TRANS_REFERENCE>201611100000777</TRANS_REFERENCE>
   </REQ_PAYLOAD>
</MYACCSERVICE>

Java DOM implementation for accepting above XML as String and removing DTD declaration

public Document removeDTDFromXML(String payload) throws Exception {

    System.out.println("### Payload received in XMlDTDRemover: " + payload);

    Document doc = null;
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    try {

        dbf.setValidating(false);
        dbf.setNamespaceAware(true);
        dbf.setFeature("http://xml.org/sax/features/namespaces", false);
        dbf.setFeature("http://xml.org/sax/features/validation", false);
        dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
        dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

        DocumentBuilder db = dbf.newDocumentBuilder();

        InputSource is = new InputSource();
        is.setCharacterStream(new StringReader(payload));
        doc = db.parse(is); 

    } catch (ParserConfigurationException e) {
        System.out.println("Parse Error: " + e.getMessage());
        return null;
    } catch (SAXException e) {
        System.out.println("SAX Error: " + e.getMessage());
        return null;
    } catch (IOException e) {
        System.out.println("IO Error: " + e.getMessage());
        return null;
    }
    return doc;

}

Destination XML (Without DTD)

<MYACCSERVICE>
   <REQ_PAYLOAD>
      <ACCOUNT>1234567890</ACCOUNT>
      <BRANCH>001</BRANCH>
      <CURRENCY>USD</CURRENCY>
      <TRANS_REFERENCE>201611100000777</TRANS_REFERENCE>
   </REQ_PAYLOAD>
</MYACCSERVICE> 
Shoaib Khan
  • 899
  • 14
  • 26
2

I know I do not have the dtd, nor do I need it.

I am suspicious of this statement; does your document contain any entity references? If so, you definitely need the DTD.

Anyway, the usual way of preventing this from happening is using an XML catalog to define a local path for "map.dtd".

Edward Z. Yang
  • 26,325
  • 16
  • 80
  • 110
2

here's another user who got the same issue : http://forums.sun.com/thread.jspa?threadID=284209&forumID=34

user ddssot on that post says

myDocumentBuilder.setEntityResolver(new EntityResolver() {
          public InputSource resolveEntity(java.lang.String publicId, java.lang.String systemId)
                 throws SAXException, java.io.IOException
          {
            if (publicId.equals("--myDTDpublicID--"))
              // this deactivates the open office DTD
              return new InputSource(new ByteArrayInputStream("<?xml version='1.0' encoding='UTF-8'?>".getBytes()));
            else return null;
          }
});

The user further mentions "As you can see, when the parser hits the DTD, the entity resolver is called. I recognize my DTD with its specific ID and return an empty XML doc instead of the real DTD, stopping all validation..."

Hope this helps.

anjanb
  • 12,999
  • 18
  • 77
  • 106
0

I'm working with sonarqube, and sonarlint for eclipse showed me Untrusted XML should be parsed without resolving external data (squid:S2755)

I managed to solve it using:

    factory = DocumentBuilderFactory.newInstance();

    factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);

    // If you can't completely disable DTDs, then at least do the following:
    // Xerces 1 - http://xerces.apache.org/xerces-j/features.html#external-general-entities
    // Xerces 2 - http://xerces.apache.org/xerces2-j/features.html#external-general-entities
    // JDK7+ - http://xml.org/sax/features/external-general-entities
    factory.setFeature("http://xml.org/sax/features/external-general-entities", false);

    // Xerces 1 - http://xerces.apache.org/xerces-j/features.html#external-parameter-entities
    // Xerces 2 - http://xerces.apache.org/xerces2-j/features.html#external-parameter-entities
    // JDK7+ - http://xml.org/sax/features/external-parameter-entities
    factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);

    // Disable external DTDs as well
    factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

    // and these as well, per Timothy Morgan's 2014 paper: "XML Schema, DTD, and Entity Attacks"
    factory.setXIncludeAware(false);
    factory.setExpandEntityReferences(false);
McCoy
  • 780
  • 1
  • 10
  • 21