3

I must read all first level nodes of the root node of large xml file that looks like the following:

<root>
 <record n="1"><a/><b/><c/></record>
 <record n="2"><a/><b/><c/></record>
 <record n="3"><a/><b/><c/></record>
</root>

And my code looks like:

var xml = XDocument.Load(filename);

var firstNode = xml?.Root?.Descendants()?.FirstOrDefault();

var elements = firstNode?.Elements();

I just need to get the first child of the root and all first level descendants of it. This code works fine, but the question is: is it safe to read like this? I guess it does not load all data into memory - only the structure of the xml file?

As I see memory is not increased while debugging. It only explodes if I actually try to see what is in xml variable.

Giorgi Nakeuri
  • 35,155
  • 8
  • 47
  • 75

2 Answers2

4

No, XDocument loads the whole document into memory. Whether it's "safe" to do this or not depends on what size of document you need to be able to handle.

If you need to handle XML files that wouldn't fit into memory, you'd want to use XmlReader, which is unfortunately considerably harder to use.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Jon, can you please explain why used memory is not increased immediately after the line `var xml = XDocument.Load(filename);`. It only increases when I try to see what is in that variable. – Giorgi Nakeuri Mar 11 '17 at 07:50
  • Here I uploaded a video: https://youtu.be/MPBmA5VGOjA You can see that memory used is ~298Mb and increases to 299 after loading. But it explodes after viewing the variable. – Giorgi Nakeuri Mar 11 '17 at 08:01
  • @GiorgiNakeuri: Looking at the memory taken in a debugger is unreliable as any number of extra things may be required for debugger interaction. In general, observing memory usage in .NET is tricky due to the way the garbage-collected heap works. But try loading a very large XML file (e.g. 500MB) in using XDocument.Load and you'll see the usage go up... – Jon Skeet Mar 11 '17 at 08:29
1

I use combination of xmlreader and xdocument. Updated code to dynamically get first tag name.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";
        static void Main(string[] args)
        {
            XmlReader reader = XmlReader.Create(FILENAME);
            reader.ReadStartElement(); //read root
            XElement.ReadFrom(reader);// read \n
            XElement record = null;
            string recordName = "";
            Boolean first = true;
            while (!reader.EOF)
            {
                if (first)
                {
                    record = (XElement)XElement.ReadFrom(reader);
                    first = false;
                    recordName = record.Name.LocalName;
                }
                else
                {
                    if (reader.Name != recordName)
                    {
                        reader.ReadToFollowing(recordName);
                    }
                    if (!reader.EOF)
                    {
                        record = (XElement)XElement.ReadFrom(reader);
                    }
                }
            }
        }
    }
}
jdweng
  • 33,250
  • 2
  • 15
  • 20
  • Instead of "record" it can be "entry" or anything else. – Giorgi Nakeuri Mar 13 '17 at 12:19
  • The above code only handles on tag. Code would have to be modified to handle more than one tag. – jdweng Mar 13 '17 at 13:51
  • There are no 2 tags. Tag name is unknown and you can not assume that it is called "record" as in your code... – Giorgi Nakeuri Mar 13 '17 at 19:17
  • record is a string so you can replace with any string variable. – jdweng Mar 13 '17 at 19:32
  • I am not sure why is it so hard to understand my concerns. I can not replace this string with any other string. I am creating an automation application that handles this xml files. Not only tag names, even structure of xml is unknown beforehand. But the structure I provided is the most common one. I just cannot hardcode tag name "record" in the application. Nor I can hardcode any other possible name of the tag. It should be dynamic. Is it still not clear? – Giorgi Nakeuri Mar 14 '17 at 06:51
  • A hard coded string can always be replace with a variable. – jdweng Mar 14 '17 at 12:18
  • Are you kidding me? Don't you really understand the problem? I can not use any variable man. I don't know beforehand what will be the tag inside xml file. What I know is that I need FIRST child and all FIRST LEVEL DESCENDANTS of FIRST child. You DON'T KNOW tag names beforehand. You DON'T KNOW xml file structure beforehand. And you tell me "just use variable"... – Giorgi Nakeuri Mar 14 '17 at 14:44