14

I know, I know this has been done to death; Im just posting a question to see if this solution is still relevant since now we have .NET 4 and newer

This link explain a simple way to read large XML files and it implements Linq. I quite like this and just want a simple answer/s to state if this is still relevant or are there better implementations in newer .NET code.

Draken
  • 3,134
  • 13
  • 34
  • 54
IEnumerable
  • 3,610
  • 14
  • 49
  • 78
  • As you can see in the site you linked, it's mentioned *LINQ to XML*, which I think is one of the easiest and fastest way (in terms of writing code) to read and write xml documents. Infact LINQ was implemented in *C# 3.0* and it a powerful way to write query over collections and data source. – Omar Oct 17 '12 at 10:30
  • 3
    @Fuex Using LINQ to XML out of the box will load the full document into memory so although it is easy to *write* querying code, it doesn't make the performance any quicker. The example linked to however, uses `XmlReader` in conjunction with LINQ so should work quite well. – James Oct 17 '12 at 11:04
  • @James Yes, I agree with you. Load the entire data into memory becomes a problem when dealing with large files and will condition the performances of the queries. So use `XmlReader` in conjunction with *LINQ* is a good idea. – Omar Oct 17 '12 at 11:24
  • Thanks guys good info, I will decide to use this method. Thanks heaps for the help. – IEnumerable Oct 18 '12 at 08:31

4 Answers4

12

If it seems like this:

<root>
    <item>...</item>
    <item>...</item>
    ...
</root>

you can read file with XmlReader and each 'item' open with XmlDocument like this:

reader.ReadToDescendant("root");
reader.ReadToDescendant("item");

do
{
    XmlDocument doc = new XmlDocument();
    doc.LoadXml(reader.ReadOuterXml());
    XmlNode item = doc.DocumentElement;

    // do your work with `item`
}
while (reader.ReadToNextSibling("item"));

reader.Close();

In this case, you have no limits on file size.

Stas BZ
  • 1,184
  • 1
  • 17
  • 36
  • @GreenGood, Not correct! I tested this code many times and it work properly. 'reader.ReadToDescendant("item");' seek the first element, 'reader.ReadOuterXml()' read current element and 'reader.ReadToNextSibling("item")' go to next element. – Stas BZ Jul 22 '16 at 08:22
  • are it possible to do this which parallel ? – xSx Jan 18 '19 at 08:42
  • Also if you do not have spaces in your xml file, after `ReadOuterXml` you should not call `ReadToNextSibling`, because it will skip one item. So you have to check the current position, and if it is at note start, do not call `ReadToNextSibling`. – Stas BZ Apr 15 '19 at 06:15
11

The answer to this question hasn't changed in .NET 4 - for best performance you should still be using XmlReader as it streams the document instead of loading the full thing into memory.

The code you refer to uses XmlReader for the actual querying so should be reasonably quick on large documents.

James
  • 80,725
  • 18
  • 167
  • 237
1

The best way to do this is read it line by line using XmlReader.Create.

var reader = XmlReader.Create(filename);
reader.WhitespaceHandling = WhitespaceHandling.None;
while (reader.Read())
{
    // your code here.
}
Trisped
  • 5,705
  • 2
  • 45
  • 58
Ekk
  • 5,627
  • 19
  • 27
0

I was struggling with the same issue from last few days. I just right click on project properties then navigated to Build tab and select option Any CPU, tick uncheck option Prefer 32 Bit and save it before to run your app, it helped me. I have attached snapshot of the same. enter image description here

Anjan Kant
  • 4,090
  • 41
  • 39