35

When using XmlDocument.Load , I am finding that if the document refers to a DTD, a connection is made to the provided URI. Is there any way to prevent this from happening?

spender
  • 117,338
  • 33
  • 229
  • 351

5 Answers5

37

After some more digging, maybe you should set the XmlResolver property of the XmlReaderSettings object to null.

'The XmlResolver is used to locate and open an XML instance document, or to locate and open any external resources referenced by the XML instance document. This can include entities, DTD, or schemas.'

So the code would look like this:

        XmlReaderSettings settings = new XmlReaderSettings();
        settings.XmlResolver = null;
        settings.DtdProcessing = DtdProcessing.Parse;
        XmlDocument doc = new XmlDocument();
        using (StringReader sr = new StringReader(xml))
            using (XmlReader reader = XmlReader.Create(sr, settings))
            {
                doc.Load(reader);
            }
Tim Rogers
  • 21,297
  • 6
  • 52
  • 68
Richard Nienaber
  • 10,324
  • 6
  • 55
  • 66
  • 6
    also required is: settings.ProhibitDtd = false; otherwise, right on the money. cheers! – spender Oct 19 '08 at 22:43
  • 3
    This is a useful trick, but keep in mind that it won't work for all XML documents. If the document actually references the DTD in some way (such as entity reference), then you'll get an XML exception when you try to read the document. – Peter Ruderman Dec 06 '10 at 17:16
  • 2
    I tried this and also found that it works, however I was a little troubled by this comment in the MSDN docs: "If set to null, an XmlException is thrown when the XmlReader tries to access an external resource". This clearly isn't happening but does anybody know why? Anybody care to comment on this? – Steg Dec 22 '10 at 10:44
  • 4
    After some experimentation, I found that setting the Processing to Parse still goes out to get the DTDs. This did the trick and stopped it being silly for me: XmlReaderSettings settings = new XmlReaderSettings {DtdProcessing = DtdProcessing.Ignore}; – hoserdude Jul 29 '13 at 20:04
7

The document being loaded HAS a DTD.

With:

settings.ProhibitDtd = true;

I see the following exception:

Service cannot be started. System.Xml.XmlException: For security reasons DTD is prohibited in this XML document. To enable DTD processing set the ProhibitDtd property on XmlReaderSettings to false and pass the settings into XmlReader.Create method.

So, it looks like ProhibitDtd MUST be set to true in this instance.

It looked like ValidationType would do the trick, but with:

settings.ValidationType = ValidationType.None;

I'm still seeing a connection to the DTD uri.

svick
  • 236,525
  • 50
  • 385
  • 514
spender
  • 117,338
  • 33
  • 229
  • 351
5

This is actually a flaw in the XML specifications. The W3C is bemoaning that people all hit their servers like mad to load schemas billions of times. Unfortunately just about no standard XML library gets this right, they all hit the servers over and over again.

The problem with DTDs is particularly serious, because DTDs may include general entity declarations (for things like & -> &) which the XML file may actually rely upon. So if your parser chooses to forgo loading the DTD, and the XML makes use of general entity references, parsing may actually fail.

The only solution to this problem would be a transparent caching entity resolver, which would put the downloaded files into some archive in the library search path, so that this archive would be dynamically created and almost automatically bundled with any software distributions made. But even in the Java world there is not one decent such EntityResolver floating about, certainly not built-in to anything from apache foundation.

spender
  • 117,338
  • 33
  • 229
  • 351
3

Try something like this:

XmlDocument doc = new XmlDocument();
using (StringReader sr = new StringReader(xml))
  using (XmlReader reader = XmlReader.Create(sr, new XmlReaderSettings()))
  {
     doc.Load(reader);
  }

The thing to note here is that XmlReaderSettings has the ProhibitDtd property set to true by default.

Richard Nienaber
  • 10,324
  • 6
  • 55
  • 66
1

Use an XMLReader to load the document and set the ValidationType property of the reader settings to None.

muratgu
  • 7,241
  • 3
  • 24
  • 26
  • 1
    That won't help you if the XML uses entity references defined in the DTD, unfortunately, because that makes the XML non-well-formed, not invalid. – Robert Rossney Oct 19 '08 at 05:59
  • So, I'm left feeling that it is necessary to process the DTD in order to correctly process entity references. How would this work in the absence of a connection though? – spender Oct 19 '08 at 12:00
  • It wouldn't. If your document can contain entity references that are defined in a DTD, the parser needs the DTD. So you have to either include the DTD in the XML you're trying to parse or cache the DTD locally. This is one reason I don't like using entity references. – Robert Rossney Oct 19 '08 at 20:05