1

I have to parse 80 GB OF XML to get some data from that file. I have used XML reader for this purpose. When I checked the code with 304 MB File. Then it parse the file within 4 sec. So I thought I will work for 80 GB. But it is giving me the memory out of exception after some minute.

I have the following code:

static void Main(string[] args)
    {

        List<Test> lstTest = new List<Test>();
        bool isTitle = false;
        bool isText = false;

        using (XmlReader Reader = XmlReader.Create(FilePath))
        {
            Test tt = new Test();
            while (Reader.Read())
            {                    switch (Reader.NodeType)
                {
                    case XmlNodeType.Element:
                        if (Reader.Name == "title")
                        {
                            isTitle = true;
                        }
                        if (Reader.Name == "text")
                        {
                            isText = true;
                        }
                        break;
                    case XmlNodeType.Text:
                        if (isTitle)
                        {
                            tt.Title = Reader.Value;
                            isTitle = false;
                        }

                        if (isText)
                        {
                            tt.Text = Reader.Value;
                            isText = false;
                        }
                        break;
                }

                if (tt.Text != null)
                {
                    lstTest.Add(tt);
                    tt = new Test();
                }
            }


        }
    }
}
}

So Please suggest. Thanks For your help.

user2247651
  • 145
  • 1
  • 4
  • 13
  • Can you check the size of `lstTest` object as you go through each iteration. I could be wrong, but I think `lstTest` is growing too big that its running out of memory. – Sai Puli Aug 25 '16 at 20:05
  • 4
    See http://stackoverflow.com/questions/15772031/how-to-parse-very-huge-xml-files-in-c – VDWWD Aug 25 '16 at 20:38

2 Answers2

6

You are correct, XmlReader is the right way to go. And it's not the XmlReader that is running out of memory - it's your lstTest where you shove most nodes that you find.

The correct way to use XmlReader would be to process the nodes and then forget about them, moving on. You can write the results to the disk, or calculate some running totals, or whatever - but don't keep everything you read in memory - that defeats the very purpose of XmlReader.

Vilx-
  • 104,512
  • 87
  • 279
  • 422
3

You shouldn't store EVERYTHING into the memory, but only keep the parts that interests you.

This can be done via IEnumerable<> and the yield return keyword:

public IEnumerable<Test> ParseXml(string path)
{
    bool isTitle = false;
    bool isText = false;

    using (XmlReader Reader = XmlReader.Create(FilePath))
    {
        Test tt = new Test();
        while (Reader.Read())
        {                    
            switch (Reader.NodeType)
            {
                case XmlNodeType.Element:
                    if (Reader.Name == "title")
                    {
                        isTitle = true;
                    }
                    if (Reader.Name == "text")
                    {
                        isText = true;
                    }
                    break;

                case XmlNodeType.Text:
                    if (isTitle)
                    {
                        tt.Title = Reader.Value;
                        isTitle = false;
                    }

                    if (isText)
                    {
                        tt.Text = Reader.Value;
                        isText = false;
                    }
                    break;
            }

            if (tt.Text != null)
            {
                yield return tt;
                tt = new Test();
            }
        }
    }
}

Usage:

var data = ParseXml(/* your xml file */);

// select the part that you are interested in
var interestingTests = data
    .Where(x => x.Title == "...")

foreach (var test in interestingTests)
{
    // work with the interesting parts
}
Graham
  • 7,431
  • 18
  • 59
  • 84
Xiaoy312
  • 14,292
  • 1
  • 32
  • 44