0

I am having problems with the default Xerces DocumentBuilderFactory implementation of Java 11 (the one included in Java 11) which tries to perform an HTTPURLConnection I assume to check some DTD or something like that. This request does not finish and thus is blocking my whole app.

To avoid this problem I would like to prevent DocumentBuilderFactory to perform any online requests. Is there a way to set it into some sort of offline state?

String xml = ...
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
docFactory.setNamespaceAware(false);
return docBuilder.parse(new InputSource(new StringReader(xml)));

The XML input comes from external sources, therefore I don't have any influence on it's content. DocumentBuilderFactory has to process all XML data I get no matter what it contains, modifying it is no option.

Stack trace where it stops:

at java.net.SocketInputStream.socketRead0(java.base@11.0.10/Native Method)
at java.net.SocketInputStream.socketRead(java.base@11.0.10/SocketInputStream.java:115)
at java.net.SocketInputStream.read(java.base@11.0.10/SocketInputStream.java:168)
at java.net.SocketInputStream.read(java.base@11.0.10/SocketInputStream.java:140)
at java.io.BufferedInputStream.fill(java.base@11.0.10/BufferedInputStream.java:252)
at java.io.BufferedInputStream.read1(java.base@11.0.10/BufferedInputStream.java:292)
at java.io.BufferedInputStream.read(java.base@11.0.10/BufferedInputStream.java:351)
- locked <0x0000000581b21270> (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.10/HttpClient.java:754)
at sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.10/HttpClient.java:689)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.10/HttpURLConnection.java:1615)
- locked <0x0000000581b21320> (a sun.net.www.protocol.http.HttpURLConnection)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.10/HttpURLConnection.java:1520)
- locked <0x0000000581b21320> (a sun.net.www.protocol.http.HttpURLConnection)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(java.xml@11.0.10/XMLEntityManager.java:676)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(java.xml@11.0.10/XMLEntityManager.java:1398)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(java.xml@11.0.10/XMLEntityManager.java:1364)
at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(java.xml@11.0.10/XMLDTDScannerImpl.java:257)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(java.xml@11.0.10/XMLDocumentScannerImpl.java:1152)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(java.xml@11.0.10/XMLDocumentScannerImpl.java:1040)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(java.xml@11.0.10/XMLDocumentScannerImpl.java:943)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(java.xml@11.0.10/XMLDocumentScannerImpl.java:605)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(java.xml@11.0.10/XMLDocumentFragmentScannerImpl.java:534)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(java.xml@11.0.10/XML11Configuration.java:888)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(java.xml@11.0.10/XML11Configuration.java:824)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(java.xml@11.0.10/XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(java.xml@11.0.10/DOMParser.java:246)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(java.xml@11.0.10/DocumentBuilderImpl.java:339)
JMax
  • 1,134
  • 1
  • 11
  • 20
  • 1
    Try this https://stackoverflow.com/a/155874/485343 – rustyx Mar 25 '21 at 08:51
  • @rustyx Thanks that was very helpful. In my case setting the option `"http://apache.org/xml/features/nonvalidating/load-external-dtd"` to false did the trick (verified it with an debugger that it does not run into the section where the HTTPUrlConnection is created). I wasn't aware that common XML parsers really load the DTDs. Isn't this a security problem as DTD URLs are typically all plaintext http:// URLs - on network level an attacker can modify/replace it and thus change the way the XML parser operates? – JMax Mar 25 '21 at 09:31

0 Answers0