16

When you create a new XDocument using XDocument.Load, does it open the XML file and keep a local copy, or does it continuously read the document from the hard drive? If it does continuously read, is there a faster way to parse XML?

XDocument x = XDocument.Load("file.xml");
Alex Angas
  • 59,219
  • 41
  • 137
  • 210
redcodefinal
  • 909
  • 3
  • 11
  • 24

3 Answers3

13

There are a couple of measurements to consider:

  1. Linear traversal speed (e.g. reading/loading)
  2. On-demand query speed

To answer the immediate question: XDocument uses an XmlReader to load the document into memory by reading each element and creating corresponding XElement instances (see code below). As such, it should be quite fast (fast enough for most purposes), but it may consume a large amount of memory when parsing a large document.

A raw XmlReader is an excellent choice for traversal if your needs are limited to that which can be done without retaining the document in memory. It will outperform other methods since no significant structure is created nor resolved with relation to other nodes (e.g. linking parent and child nodes). However, on-demand query ability is almost non-existent; you can react to values found in each node, but you can't query the document as a whole. If you need to look at the document a second time, you have to traverse the whole thing again.

By comparison, an XDocument will take longer to traverse because it instantiates new objects and performs basic structural tasks. It will also consume memory proportionate to the size of the source. In exchange for these trade-offs, you gain excellent query abilities.

It may be possible to combine the approaches, as mentioned by Jon Skeet and shown here: Streaming Into LINQ to XML Using C# Custom Iterators and XmlReader.

Source for XDocument Load()

public static XDocument Load(Stream stream, LoadOptions options)
{
    XmlReaderSettings xmlReaderSettings = XNode.GetXmlReaderSettings(options);
    XDocument result;
    using (XmlReader xmlReader = XmlReader.Create(stream, xmlReaderSettings))
    {
        result = XDocument.Load(xmlReader, options);
    }
    return result;
}

// which calls...

public static XDocument Load(XmlReader reader, LoadOptions options)
{
    if (reader == null)
    {
        throw new ArgumentNullException("reader");
    }
    if (reader.ReadState == ReadState.Initial)
    {
        reader.Read();
    }
    XDocument xDocument = new XDocument();
    if ((options & LoadOptions.SetBaseUri) != LoadOptions.None)
    {
        string baseURI = reader.BaseURI;
        if (baseURI != null && baseURI.Length != 0)
        {
            xDocument.SetBaseUri(baseURI);
        }
    }
    if ((options & LoadOptions.SetLineInfo) != LoadOptions.None)
    {
        IXmlLineInfo xmlLineInfo = reader as IXmlLineInfo;
        if (xmlLineInfo != null && xmlLineInfo.HasLineInfo())
        {
            xDocument.SetLineInfo(xmlLineInfo.LineNumber, xmlLineInfo.LinePosition);
        }
    }
    if (reader.NodeType == XmlNodeType.XmlDeclaration)
    {
        xDocument.Declaration = new XDeclaration(reader);
    }
    xDocument.ReadContentFrom(reader, options);
    if (!reader.EOF)
    {
        throw new InvalidOperationException(Res.GetString("InvalidOperation_ExpectedEndOfFile"));
    }
    if (xDocument.Root == null)
    {
        throw new InvalidOperationException(Res.GetString("InvalidOperation_MissingRoot"));
    }
    return xDocument;
}

// which calls...

internal void ReadContentFrom(XmlReader r, LoadOptions o)
{
    if ((o & (LoadOptions.SetBaseUri | LoadOptions.SetLineInfo)) == LoadOptions.None)
    {
        this.ReadContentFrom(r);
        return;
    }
    if (r.ReadState != ReadState.Interactive)
    {
        throw new InvalidOperationException(Res.GetString("InvalidOperation_ExpectedInteractive"));
    }
    XContainer xContainer = this;
    XNode xNode = null;
    NamespaceCache namespaceCache = default(NamespaceCache);
    NamespaceCache namespaceCache2 = default(NamespaceCache);
    string text = ((o & LoadOptions.SetBaseUri) != LoadOptions.None) ? r.BaseURI : null;
    IXmlLineInfo xmlLineInfo = ((o & LoadOptions.SetLineInfo) != LoadOptions.None) ? (r as IXmlLineInfo) : null;
    while (true)
    {
        string baseURI = r.BaseURI;
        switch (r.NodeType)
        {
        case XmlNodeType.Element:
        {
            XElement xElement = new XElement(namespaceCache.Get(r.NamespaceURI).GetName(r.LocalName));
            if (text != null && text != baseURI)
            {
                xElement.SetBaseUri(baseURI);
            }
            if (xmlLineInfo != null && xmlLineInfo.HasLineInfo())
            {
                xElement.SetLineInfo(xmlLineInfo.LineNumber, xmlLineInfo.LinePosition);
            }
            if (r.MoveToFirstAttribute())
            {
                do
                {
                    XAttribute xAttribute = new XAttribute(namespaceCache2.Get((r.Prefix.Length == 0) ? string.Empty : r.NamespaceURI).GetName(r.LocalName), r.Value);
                    if (xmlLineInfo != null && xmlLineInfo.HasLineInfo())
                    {
                        xAttribute.SetLineInfo(xmlLineInfo.LineNumber, xmlLineInfo.LinePosition);
                    }
                    xElement.AppendAttributeSkipNotify(xAttribute);
                }
                while (r.MoveToNextAttribute());
                r.MoveToElement();
            }
            xContainer.AddNodeSkipNotify(xElement);
            if (r.IsEmptyElement)
            {
                goto IL_30A;
            }
            xContainer = xElement;
            if (text != null)
            {
                text = baseURI;
                goto IL_30A;
            }
            goto IL_30A;
        }
        case XmlNodeType.Text:
        case XmlNodeType.Whitespace:
        case XmlNodeType.SignificantWhitespace:
            if ((text != null && text != baseURI) || (xmlLineInfo != null && xmlLineInfo.HasLineInfo()))
            {
                xNode = new XText(r.Value);
                goto IL_30A;
            }
            xContainer.AddStringSkipNotify(r.Value);
            goto IL_30A;
        case XmlNodeType.CDATA:
            xNode = new XCData(r.Value);
            goto IL_30A;
        case XmlNodeType.EntityReference:
            if (!r.CanResolveEntity)
            {
                goto Block_25;
            }
            r.ResolveEntity();
            goto IL_30A;
        case XmlNodeType.ProcessingInstruction:
            xNode = new XProcessingInstruction(r.Name, r.Value);
            goto IL_30A;
        case XmlNodeType.Comment:
            xNode = new XComment(r.Value);
            goto IL_30A;
        case XmlNodeType.DocumentType:
            xNode = new XDocumentType(r.LocalName, r.GetAttribute("PUBLIC"), r.GetAttribute("SYSTEM"), r.Value, r.DtdInfo);
            goto IL_30A;
        case XmlNodeType.EndElement:
        {
            if (xContainer.content == null)
            {
                xContainer.content = string.Empty;
            }
            XElement xElement2 = xContainer as XElement;
            if (xElement2 != null && xmlLineInfo != null && xmlLineInfo.HasLineInfo())
            {
                xElement2.SetEndElementLineInfo(xmlLineInfo.LineNumber, xmlLineInfo.LinePosition);
            }
            if (xContainer == this)
            {
                return;
            }
            if (text != null && xContainer.HasBaseUri)
            {
                text = xContainer.parent.BaseUri;
            }
            xContainer = xContainer.parent;
            goto IL_30A;
        }
        case XmlNodeType.EndEntity:
            goto IL_30A;
        }
        break;
        IL_30A:
        if (xNode != null)
        {
            if (text != null && text != baseURI)
            {
                xNode.SetBaseUri(baseURI);
            }
            if (xmlLineInfo != null && xmlLineInfo.HasLineInfo())
            {
                xNode.SetLineInfo(xmlLineInfo.LineNumber, xmlLineInfo.LinePosition);
            }
            xContainer.AddNodeSkipNotify(xNode);
            xNode = null;
        }
        if (!r.Read())
        {
            return;
        }
    }
    goto IL_2E1;
    Block_25:
    throw new InvalidOperationException(Res.GetString("InvalidOperation_UnresolvedEntityReference"));
    IL_2E1:
    throw new InvalidOperationException(Res.GetString("InvalidOperation_UnexpectedNodeType", new object[]
    {
        r.NodeType
    }));
}
Community
  • 1
  • 1
Tim M.
  • 53,671
  • 14
  • 120
  • 163
7

It will parse the incoming stream (whether it is from a file or a string doesn't matter) when you call Load() and then keep a local instance of the document in memory. Since the source can be anything (could be a NetworkStream, a DataReader, a string entered by the user) it couldn't go back and try to read the data again since it wouldn't know the state of it (streams being closed etc).

If you really want speed on the other hand, XDocument isn't the fastets (all though it is easier to work with) since it will need to first parse the document and then retain it in memory. If you are working with really large documents using an approach with System.Xml.XmlReader is usually way faster since it can read the document as a stream and doesn't need to retain anything except the current element. This benchmark shows some interesting figures about this.

Karl-Johan Sjögren
  • 16,544
  • 7
  • 59
  • 68
  • Thanks this is really helpful. I was using it in XNA, and I think I need to tone down the amount of times I call XDocument.Load, since I do it in every object and I don't need to. – redcodefinal Dec 22 '12 at 07:59
  • Updated with new link to the same article. – Karl-Johan Sjögren Aug 19 '15 at 11:41
  • 1
    @Karl-JohanSjögren The link is still dead. – silkfire Oct 12 '17 at 14:20
  • Not still, but it died again. It took me a whole minute to find the article on Google though so I've updated the link again. – Karl-Johan Sjögren Oct 12 '17 at 14:25
  • @Karl-JohanSjögren What keywords to find the article again? Link died again... – Anton Sep 28 '21 at 10:20
  • Wayback Machine to the rescue! I've updated the link again, this time to a cached copy that will hopefully be available close to forever. If it gets lost again it will probably be gone forever though, it was written back in 2008 and the whole blog platform has been gone since at least 2015. – Karl-Johan Sjögren Sep 28 '21 at 10:44
1

I do no think it continuously reads; the nice thing about the XDocument.Load method is that it uses XmlReader to read the XML into an XML tree. And since now you just created a tree which is most likely stored in your memory as a tree it no longer reads the document constantly. It manipulates the tree and since it is a tree all your reading and modification are done a lot faster. Although it does not not implement IDisposable it is automatically disposed.

nawfal
  • 70,104
  • 56
  • 326
  • 368
COLD TOLD
  • 13,513
  • 3
  • 35
  • 52