OutofMemoryException while reading large XML file in C# windows forms

Question

I have been storing the .png files into one of the XML element as below. I have been converting the image file into base64 and then copying that string to XML. I was also able to read.load this XML file into my window form. The issue I am facing is as the XML file is growing with more nodes, XML file size has grown too large, right now it is 300 MB. When the Windows forms is trying to read this large XML file I am getting OutOfMemoryExceptions. Below is the snippet of my xml file.

<TestResult>
    <ResultsID>49</ResultsID>
    <DateExecuted>2018-02-20T09:36:12.787</DateExecuted>
    <UserExecuted>xxx</UserExecuted>
    <CorrectedMean>1966.32245</CorrectedMean>
    <CorrectedVariance>19525.6632019949</CorrectedVariance>
    <TestPassed>true</TestPassed>
    <TestResultImage>Qk2.......</TestResultImage>
</TestResult>

I have been trying to load the XML into .net using the below code

XDocument xmlResultsDoc = XDocument.Load("MeanData.xml");

and storing into my model class as below.

List<MeanVarianceTestResultDataList = 
(xmlResultsDoc.Descendants("TestResult").Select(m => new 
MeanVarianceTestResultsData()
                    {
                        ResultsID = 
Convert.ToInt32(m.Element("ResultsID").Value),
                        DateExecuted = 
Convert.ToDateTime(m.Element("DateExecuted").Value),
                        UserExecuted = 
Convert.ToString(m.Element("UserExecuted").Value),
                        CorrectedMean = 
Convert.ToString(m.Element("CorrectedMean").Value),
                        CorrectedVariance = 
Convert.ToString(m.Element("CorrectedVariance").Value),
                        TestPassed = 
Convert.ToBoolean(m.Element("TestPassed").Value),
                        TestResultImage =  

Convert.FromBase64String(
Convert.ToString(m.Element("TestResultImage").Value))
                    })).ToList();

consider using a SAX api. – Daniel A. White Mar 06 '18 at 20:55 — Daniel A. White, Mar 06 '18 at 20:55

score 0 · Accepted Answer · answered Mar 07 '18 at 06:10

With huge xml files you must use XmlReader to prevent out-of-memory errors. Try code below which uses a combination of xml linq and XmlReader

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";
        static void Main(string[] args)
        {
            XmlReader reader = XmlReader.Create(FILENAME);

            while (!reader.EOF)
            {
                if (reader.Name != "TestResult")
                {
                    reader.ReadToFollowing("TestResult");
                }
                if (!reader.EOF)
                {
                    XElement testResult = (XElement)XElement.ReadFrom(reader);
                    string image = (string)testResult.Element("TestResultImage"); 
                }
            }
        }
    }
}

Thank you. I have one more question. One reason for my large xml files is due to the image i am saving into one of the elements. To store this image I am converting the image initially to byte[] and then converting it to Base64 string using the code Convert.ToBase64String(TestResultData.TestResultImage)). Here TestResultImage is the byte[]. Seems converting the byte[] to Base64 string is causing my XML file to bloat. Is there any other work around to Base64 to store the byte[] into XML? — Baba, Mar 07 '18 at 14:35
Base64String does compression so it is probably the best method. Maybe serializing the image would give better compression. — jdweng, Mar 07 '18 at 14:54

dbc · Answer 2 · 2018-03-13T00:00:56.997

If your XML file is too large to load into memory at once, you can use XmlReader to stream through the file and load only small portions at a time. Further, for the <TestResultImage> element which contains very large Base64-encoded binary data, you can use XmlReader.ReadElementContentAsBase64(Byte[], Int32, Int32) to incrementally read the data in chunks and copy them into some Stream.

The following code shows how to accomplish this:

//https://stackoverflow.com/questions/49159697/deserializing-json-child-object-values-into-parent-object-using-jsonconvert-dese

public class MeanVarianceTestResultsData
{
    public int ResultsID { get; set; }
    public DateTime DateExecuted { get; set; }
    public string UserExecuted { get; set; }
    public string CorrectedMean { get; set; }
    public string CorrectedVariance { get; set; }
    public bool TestPassed { get; set; }

    public string TestResultImageFile { get; set; }
    public Stream TestResultImage { get; set; }
}

public static class MeanVarianceTestResultsDataExtensions
{
    public static List<MeanVarianceTestResultsData> ReadResultListFrom(XmlReader reader, Func<MeanVarianceTestResultsData, Stream> openStream, Func<Stream, Stream> closeStream)
    {
        return reader.ReadSubtrees("TestResult").Select(r => ReadResultFrom(r, openStream, closeStream)).ToList();
    }

    public static MeanVarianceTestResultsData ReadResultFrom(XmlReader reader, Func<MeanVarianceTestResultsData, Stream> openStream, Func<Stream, Stream> closeStream)
    {
        if (reader == null || openStream == null)
            throw new ArgumentNullException();
        reader.MoveToContent();
        var result = new MeanVarianceTestResultsData();
        var isEmpty = reader.IsEmptyElement;
        // Read the root
        reader.Read();
        if (isEmpty)
            return result;
        while (!reader.EOF)
        {
            if (reader.NodeType == XmlNodeType.EndElement)
            {
                reader.Read();
                break;
            }
            else if (reader.NodeType != XmlNodeType.Element)
                // Comment, text, CDATA, etc.
                reader.Skip();
            else if (reader.Name == "ResultsID")
                result.ResultsID = reader.ReadElementContentAsInt();
            else if (reader.Name == "DateExecuted")
                result.DateExecuted = reader.ReadElementContentAsDateTime();
            else if (reader.Name == "UserExecuted")
                result.UserExecuted = reader.ReadElementContentAsString();
            else if (reader.Name == "CorrectedMean")
                result.CorrectedMean = reader.ReadElementContentAsString();
            else if (reader.Name == "TestPassed")
                result.TestPassed = reader.ReadElementContentAsBoolean();
            else if (reader.Name == "TestResultImage")
                result.TestResultImage = reader.ReadElementContentAsStream(() => openStream(result), closeStream);
            else
                reader.Skip();
        }
        return result;
    }
}

public static class XmlReaderExtensions
{
    public static Stream ReadElementContentAsStream(this XmlReader reader, Func<Stream> openStream, Func<Stream, Stream> closeStream)
    {
        if (reader == null || openStream == null)
            throw new ArgumentNullException();
        Stream stream = null;
        try
        {
            stream = openStream();
            byte[] buffer = new byte[4096];
            int readBytes = 0;
            while ((readBytes = reader.ReadElementContentAsBase64(buffer, 0, buffer.Length)) > 0)
            {
                stream.Write(buffer, 0, readBytes);
            }
        }
        finally
        {
            if (closeStream != null && stream != null)
                stream = closeStream(stream);
        }
        return stream;
    }

    public static IEnumerable<XmlReader> ReadSubtrees(this XmlReader reader, string name)
    {
        while (reader.ReadToFollowing(name))
        {
            using (var subReader = reader.ReadSubtree())
                yield return subReader;
        }
    }
}

And then, you could use it as follows to read each TestResultImage images into a MemoryStream:

List<MeanVarianceTestResultsData> results;
using (var reader = XmlReader.Create(fileName))
{
    results = MeanVarianceTestResultsDataExtensions.ReadResultListFrom(reader, m => new MemoryStream(), s => { s.Position = 0; return s; });
}

This will save substantial amounts of memory by completely skipping the intermediate Base64 string representation for the images - but it will still use quite a lot of memory for each MemoryStream. Alternatively, you could stream the images into some temporary files for later use, e.g. by doing the following:

List<MeanVarianceTestResultsData> results;
using (var reader = XmlReader.Create(fileName))
{
    results = MeanVarianceTestResultsDataExtensions.ReadResultListFrom(
        reader,
        m => { m.TestResultImageFile = Path.GetTempFileName(); return File.Open(m.TestResultImageFile, FileMode.Create); },
        s => { s.Dispose(); return null; });
}

In this case each stream is disposed after the image is written and the file name is stored in the MeanVarianceTestResultsData. (Of course, you could leave the streams open if you plan to immediately process them after deserialization.)

Sample fiddle.

Excellent!. I will definitely give a try with the above code. Thank you friend. — Baba, Mar 08 '18 at 14:50
your logic was good, but unfortunately we decided not to store the images any more in the XML files. As the XML file keeps growing each day the application is taking too long to load and process these. Rather we decided to store the reference of the Image in the XML element and store the actual image on a server hard drive. By this the application is loading the XML files pretty quick. — Baba, Mar 14 '18 at 12:31

OutofMemoryException while reading large XML file in C# windows forms

2 Answers2