ReadOuterXml is throwing OutOfMemoryException reading part of large (1 GB) XML file

Question

I am working on a large XML file and while running the application, XmlTextReader.ReadOuterXml() method is throwing memory exception.

Lines of codes are like,

XmlTextReader xr = null;
try
{
    xr = new XmlTextReader(fileName);
    while (xr.Read() && success)
    {
        if (xr.NodeType != XmlNodeType.Element) 
            continue;
        switch (xr.Name)
        {
            case "A":
                var xml = xr.ReadOuterXml();
                var n = GetDetails(xml);
                break;
        }
    }
}
catch (Exception ex)
{
    //Do stuff
}

Using:

private int GetDetails (string xml)
{

    var rootNode = XDocument.Parse(xml);
    var xnodes = rootNode.XPathSelectElements("//A/B").ToList();
    //Then  working on list of nodes

}

Now while loading the XML files, the application throwing exception on the xr.ReadOuterXml() line. What can be done to avoid this? The size of XML is almost 1 GB.

Simply size of loaded xml is too big. You can consider using iterator and yield result of `GetDetails` which keep memory low. — Fabio, Oct 06 '17 at 10:54
Hi @Fabio, Yes I can use iterators, but that would affect the performance I believe — Aniket, Oct 06 '17 at 11:52
Programming is always tradeoffs between speed/performance and memory size. You need to choose one — Fabio, Oct 06 '17 at 11:58

dbc · Accepted Answer · 2019-05-08T21:46:54.520

The most likely reason you are getting a OutOfMemoryException in ReadOuterXml() is that you are trying to read in a substantial portion of the 1 GB XML document into a string, and are hitting the Maximum string length in .Net.

So, don't do that. Instead load directly from the XmlReader using XDocument.Load() with XmlReader.ReadSubtree():

using (var xr = XmlReader.Create(fileName))
{
    while (xr.Read() && success)
    {
        if (xr.NodeType != XmlNodeType.Element)
            continue;
        switch (xr.Name)
        {
            case "A":
                {
                    // ReadSubtree() positions the reader at the EndElement of the element read, so the 
                    // next call to Read() moves to the next node.
                    using (var subReader = xr.ReadSubtree())
                    {
                        var doc = XDocument.Load(subReader);
                        GetDetails(doc);
                    }
                }
                break;
        }
    }
}

And then in GetDetails() do:

private int GetDetails(XDocument rootDocument)
{
    var xnodes = rootDocument.XPathSelectElements("//A/B").ToList();
    //Then  working on list of nodes
    return xnodes.Count;
}

Not only will this use less memory, it will also be more performant. ReadOuterXml() uses a temporary XmlWriter to copy the XML in the input stream to an output StringWriter (which you then parse a second time). This version of the algorithm completely skips this extra work. It also avoids creating strings large enough to go on the large object heap which can cause additional performance issues.

If this is still using too much memory you will need to implement SAX-like parsing for your XML where you only load one element <B> at a time. First, introduce the following extension method:

public static partial class XmlReaderExtensions
{
    public static IEnumerable<XElement> WalkXmlElements(this XmlReader xmlReader, Predicate<Stack<XName>> filter)
    {
        Stack<XName> names = new Stack<XName>();

        while (xmlReader.Read())
        {
            if (xmlReader.NodeType == XmlNodeType.Element)
            {
                names.Push(XName.Get(xmlReader.LocalName, xmlReader.NamespaceURI));
                if (filter(names))
                {
                    using (var subReader = xmlReader.ReadSubtree())
                    {
                        yield return XElement.Load(subReader);
                    }
                }
            }

            if ((xmlReader.NodeType == XmlNodeType.Element && xmlReader.IsEmptyElement)
                || xmlReader.NodeType == XmlNodeType.EndElement)
            {
                names.Pop();
            }
        }
    }
}

Then, use it as follows:

using (var xr = XmlReader.Create(fileName))
{
    Predicate<Stack<XName>> filter =
        (stack) => stack.Peek().LocalName == "B" && stack.Count > 1 && stack.ElementAt(1).LocalName == "A";
    foreach (var element in xr.WalkXmlElements(filter))
    {
        //Then working on the specific node.
    }
}

Thanks @dbc... This really helps. – Aniket Oct 09 '17 at 13:34 — Aniket, Oct 09 '17 at 13:34

Ritesh desale · Answer 2 · 2021-03-12T16:35:48.057

0

using (var reader = XmlReader.Create(fileName))
{   
    XmlDocument oXml = new XmlDocument();
    while (reader.Read())
        {                  
            oXml.Load(reader);                    
        }
}

For me above code resolved the issue when we return it to XmlDocument through XmlDocument Load method

edited Mar 12 '21 at 16:35

answered Mar 11 '21 at 15:21

Ritesh desale

1
1

Welcome to SO. What is `command`? How are you reading in the file? You need to be detailed when posting code in your solution. – Connor Low Mar 11 '21 at 16:16
@ConnorLow I have updated code with file reader , I was getting the 'System.OutOfMemoryException' while getting the data from database therefore added with SQL connection with code but some how it was not saved properly – Ritesh desale Mar 12 '21 at 16:47

ReadOuterXml is throwing OutOfMemoryException reading part of large (1 GB) XML file

2 Answers2

Linked