103

I'm trying to read the following Xml document as fast as I can and let additional classes manage the reading of each sub block.

<ApplicationPool>
    <Accounts>
        <Account>
            <NameOfKin></NameOfKin>
            <StatementsAvailable>
                <Statement></Statement>
            </StatementsAvailable>
        </Account>
    </Accounts>
</ApplicationPool>

However, I'm trying to use the XmlReader object to read each Account and subsequently the "StatementsAvailable". Do you suggest using XmlReader.Read and check each element and handle it?

I've thought of seperating my classes to handle each node properly. So theres an AccountBase class that accepts a XmlReader instance that reads the NameOfKin and several other properties about the account. Then I was wanting to interate through the Statements and let another class fill itself out about the Statement (and subsequently add it to an IList).

Thus far I have the "per class" part done by doing XmlReader.ReadElementString() but I can't workout how to tell the pointer to move to the StatementsAvailable element and let me iterate through them and let another class read each of those proeprties.

Sounds easy!

mezoid
  • 28,090
  • 37
  • 107
  • 148
Gloria Huang
  • 1,043
  • 2
  • 8
  • 4
  • 1
    Click the orange question mark in the upper right corner of the edit box to get editing help. Probably you want to create a code block, which is done by first a blank line and then each line indented with four spaces. – Anders Abel Mar 14 '10 at 09:11
  • or just select your lines of code/XML and then click on the "code" button (101 010) in the editor toolbar - as simple as that! – marc_s Mar 14 '10 at 09:31

7 Answers7

171

My experience of XmlReader is that it's very easy to accidentally read too much. I know you've said you want to read it as quickly as possible, but have you tried using a DOM model instead? I've found that LINQ to XML makes XML work much much easier.

If your document is particularly huge, you can combine XmlReader and LINQ to XML by creating an XElement from an XmlReader for each of your "outer" elements in a streaming manner: this lets you do most of the conversion work in LINQ to XML, but still only need a small portion of the document in memory at any one time. Here's some sample code (adapted slightly from this blog post):

static IEnumerable<XElement> SimpleStreamAxis(string inputUrl,
                                              string elementName)
{
  using (XmlReader reader = XmlReader.Create(inputUrl))
  {
    reader.MoveToContent();
    while (reader.Read())
    {
      if (reader.NodeType == XmlNodeType.Element)
      {
        if (reader.Name == elementName)
        {
          XElement el = XNode.ReadFrom(reader) as XElement;
          if (el != null)
          {
            yield return el;
          }
        }
      }
    }
  }
}

I've used this to convert the StackOverflow user data (which is enormous) into another format before - it works very well.

EDIT from radarbob, reformatted by Jon - although it's not quite clear which "read too far" problem is being referred to...

This should simplify the nesting and take care of the "a read too far" problem.

using (XmlReader reader = XmlReader.Create(inputUrl))
{
    reader.ReadStartElement("theRootElement");

    while (reader.Name == "TheNodeIWant")
    {
        XElement el = (XElement) XNode.ReadFrom(reader);
    }

    reader.ReadEndElement();
}

This takes care of "a read too far" problem because it implements the classic while loop pattern:

initial read;
(while "we're not at the end") {
    do stuff;
    read;
}
ruffin
  • 16,507
  • 9
  • 88
  • 138
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 18
    Calling XNode.ReadFrom reads the element and goes to the next one, then the following reader.Read() reads the next one again. You'd essentially miss an element if they happen to have the same name and be consecutive. – pbz Jul 29 '11 at 16:44
  • 3
    @pbz: Thanks. I'm not sure I trust myself to edit it correctly (that's how much I dislike XmlReader :) Are you able to edit it correctly? – Jon Skeet Jul 29 '11 at 16:48
  • @Jon: :-) I ended up doing something like Paul Alexander (below) recommended. Using a label (yuk) and making it jump to the first instruction in the "while" loop, and thus skipping an extra "read" "should" work, I think :-) – pbz Jul 29 '11 at 16:59
  • 1
    @JonSkeet - I might be missing something but wont simply changing `if(reader.Name == elementName)` to `while(reader.Name == elementName)` fix the issue pointed out by pbz ? – David McLean Apr 02 '14 at 10:18
  • 1
    @pbz: I changed the line: XElement el = XNode.ReadFrom(reader) as XElement; to be: XElement el = XElement.Load(reader.ReadSubtree()); since this fixes the skipping consecutive elements bug. – Dylan Hogg Mar 23 '16 at 02:02
  • 1
    As mentioned in other comments, the current version of `SimpleStreamAxis()` will skip elements when the XML is not indented, because `Node.ReadFrom()` positions the reader at the next node **after** the element loaded - which will get skipped by the next unconditional `Read()`. If the next node is whitespace then all is well. Otherwise, not. For versions without this issue see [here](https://stackoverflow.com/a/19165632/3744182), [here](https://stackoverflow.com/a/18282052/3744182) or [here](https://stackoverflow.com/a/46778413/3744182). – dbc Jan 10 '18 at 00:08
33

Three years later, perhaps with the renewed emphasis on WebApi and xml data, I came across this question. Since codewise I am inclined to follow Skeet out of an airplane without a parachute, and seeing his initial code doubly corraborated by the MS Xml team article as well as an example in BOL Streaming Transform of Large Xml Docs, I very quickly overlooked the other comments, most specifically from 'pbz', who pointed out that if you have the same elements by name in succession, every other one is skipped because of the double read. And in fact, the BOL and MS blog articles both were parsing source documents with target elements nested deeper than second level, masking this side-effect.

The other answers address this problem. I just wanted to offer a slightly simpler revision that seems to work well so far, and takes into account that the xml might come from different sources, not just a uri, and so the extension works on the user managed XmlReader. The one assumption is that the reader is in its initial state, since otherwise the first 'Read()' might advance past a desired node:

public static IEnumerable<XElement> ElementsNamed(this XmlReader reader, string elementName)
{
    reader.MoveToContent(); // will not advance reader if already on a content node; if successful, ReadState is Interactive
    reader.Read();          // this is needed, even with MoveToContent and ReadState.Interactive
    while(!reader.EOF && reader.ReadState == ReadState.Interactive)
    {
        // corrected for bug noted by Wes below...
        if(reader.NodeType == XmlNodeType.Element && reader.Name.Equals(elementName))
        {
             // this advances the reader...so it's either XNode.ReadFrom() or reader.Read(), but not both
             var matchedElement = XNode.ReadFrom(reader) as XElement;
             if(matchedElement != null)
                 yield return matchedElement;
        }
        else
            reader.Read();
    }
}
mdisibio
  • 3,148
  • 31
  • 47
  • 1
    Your "if(reader.Name.Equals(elementName))" statement is missing a corresponding "else reader.Read();" statement. If the element is not what you want you want to continue reading. That's what I had to add to get it working for me. – Wes Mar 21 '14 at 16:29
  • 1
    @Wes Fixed the issue by collapsing the two conditionals (NodeType and Name) so that the `else Read()` applies to both. Thanks for catching that. – mdisibio Mar 21 '14 at 20:44
  • 1
    I upvoted you, but I'm not very happy to see Read method call written twice. May be you could use a do while loop here? :) – nawfal Aug 20 '15 at 06:38
  • Another answer that noticed and resolved the same problem with the MSDN docs: https://stackoverflow.com/a/18282052/3744182 – dbc Jan 09 '18 at 19:17
17

We do this kind of XML parsing all the time. The key is defining where the parsing method will leave the reader on exit. If you always leave the reader on the next element following the element that was first read then you can safely and predictably read in the XML stream. So if the reader is currently indexing the <Account> element, after parsing the reader will index the </Accounts> closing tag.

The parsing code looks something like this:

public class Account
{
    string _accountId;
    string _nameOfKin;
    Statements _statmentsAvailable;

    public void ReadFromXml( XmlReader reader )
    {
        reader.MoveToContent();

        // Read node attributes
        _accountId = reader.GetAttribute( "accountId" );
        ...

        if( reader.IsEmptyElement ) { reader.Read(); return; }

        reader.Read();
        while( ! reader.EOF )
        {
            if( reader.IsStartElement() )
            {
                switch( reader.Name )
                {
                    // Read element for a property of this class
                    case "NameOfKin":
                        _nameOfKin = reader.ReadElementContentAsString();
                        break;

                    // Starting sub-list
                case "StatementsAvailable":
                    _statementsAvailable = new Statements();
                    _statementsAvailable.Read( reader );
                    break;

                    default:
                        reader.Skip();
                }
            }
            else
            {
                reader.Read();
                break;
            }
        }       
    }
}

The Statements class just reads in the <StatementsAvailable> node

public class Statements
{
    List<Statement> _statements = new List<Statement>();

    public void ReadFromXml( XmlReader reader )
    {
        reader.MoveToContent();
        if( reader.IsEmptyElement ) { reader.Read(); return; }

        reader.Read();
        while( ! reader.EOF )
        {
            if( reader.IsStartElement() )
            {
                if( reader.Name == "Statement" )
                {
                    var statement = new Statement();
                    statement.ReadFromXml( reader );
                    _statements.Add( statement );               
                }
                else
                {
                    reader.Skip();
                }
            }
            else
            {
                reader.Read();
                break;
            }
        }
    }
}

The Statement class would look very much the same

public class Statement
{
    string _satementId;

    public void ReadFromXml( XmlReader reader )
    {
        reader.MoveToContent();

        // Read noe attributes
        _statementId = reader.GetAttribute( "statementId" );
        ...

        if( reader.IsEmptyElement ) { reader.Read(); return; }

        reader.Read();
        while( ! reader.EOF )
        {           
            ....same basic loop
        }       
    }
}
crush
  • 16,713
  • 9
  • 59
  • 100
Paul Alexander
  • 31,970
  • 14
  • 96
  • 151
8

For sub-objects, ReadSubtree() gives you an xml-reader limited to the sub-objects, but I really think that you are doing this the hard way. Unless you have very specific requirements for handling unusual / unpredicatable xml, use XmlSerializer (perhaps coupled with sgen.exe if you really want).

XmlReader is... tricky. Contrast to:

using System;
using System.Collections.Generic;
using System.Xml.Serialization;
public class ApplicationPool {
    private readonly List<Account> accounts = new List<Account>();
    public List<Account> Accounts {get{return accounts;}}
}
public class Account {
    public string NameOfKin {get;set;}
    private readonly List<Statement> statements = new List<Statement>();
    public List<Statement> StatementsAvailable {get{return statements;}}
}
public class Statement {}
static class Program {
    static void Main() {
        XmlSerializer ser = new XmlSerializer(typeof(ApplicationPool));
        ser.Serialize(Console.Out, new ApplicationPool {
            Accounts = { new Account { NameOfKin = "Fred",
                StatementsAvailable = { new Statement {}, new Statement {}}}}
        });
    }
}
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Seems like MS also suggests XmlSerialzer as default now: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/serialization/how-to-read-object-data-from-an-xml-file – Jan Feb 21 '22 at 12:05
5

The following example navigates through the stream to determine the current node type, and then uses XmlWriter to output the XmlReader content.

    StringBuilder output = new StringBuilder();

    String xmlString =
            @"<?xml version='1.0'?>
            <!-- This is a sample XML document -->
            <Items>
              <Item>test with a child element <more/> stuff</Item>
            </Items>";
    // Create an XmlReader
    using (XmlReader reader = XmlReader.Create(new StringReader(xmlString)))
    {
        XmlWriterSettings ws = new XmlWriterSettings();
        ws.Indent = true;
        using (XmlWriter writer = XmlWriter.Create(output, ws))
        {

            // Parse the file and display each of the nodes.
            while (reader.Read())
            {
                switch (reader.NodeType)
                {
                    case XmlNodeType.Element:
                        writer.WriteStartElement(reader.Name);
                        break;
                    case XmlNodeType.Text:
                        writer.WriteString(reader.Value);
                        break;
                    case XmlNodeType.XmlDeclaration:
                    case XmlNodeType.ProcessingInstruction:
                        writer.WriteProcessingInstruction(reader.Name, reader.Value);
                        break;
                    case XmlNodeType.Comment:
                        writer.WriteComment(reader.Value);
                        break;
                    case XmlNodeType.EndElement:
                        writer.WriteFullEndElement();
                        break;
                }
            }

        }
    }
    OutputTextBlock.Text = output.ToString();

The following example uses the XmlReader methods to read the content of elements and attributes.

StringBuilder output = new StringBuilder();

String xmlString =
    @"<bookstore>
        <book genre='autobiography' publicationdate='1981-03-22' ISBN='1-861003-11-0'>
            <title>The Autobiography of Benjamin Franklin</title>
            <author>
                <first-name>Benjamin</first-name>
                <last-name>Franklin</last-name>
            </author>
            <price>8.99</price>
        </book>
    </bookstore>";

// Create an XmlReader
using (XmlReader reader = XmlReader.Create(new StringReader(xmlString)))
{
    reader.ReadToFollowing("book");
    reader.MoveToFirstAttribute();
    string genre = reader.Value;
    output.AppendLine("The genre value: " + genre);

    reader.ReadToFollowing("title");
    output.AppendLine("Content of the title element: " + reader.ReadElementContentAsString());
}

OutputTextBlock.Text = output.ToString();
Muhammad Awais
  • 4,238
  • 1
  • 42
  • 37
0
    XmlDataDocument xmldoc = new XmlDataDocument();
    XmlNodeList xmlnode ;
    int i = 0;
    string str = null;
    FileStream fs = new FileStream("product.xml", FileMode.Open, FileAccess.Read);
    xmldoc.Load(fs);
    xmlnode = xmldoc.GetElementsByTagName("Product");

You can loop through xmlnode and get the data...... C# XML Reader

Olle Sjögren
  • 5,315
  • 3
  • 31
  • 51
  • 4
    This class is deprecated. Do not use. – nawfal Aug 20 '15 at 06:36
  • @Elvarism There are many other read xml ways in the website you share, and that help me a lot. I'll vote you up. Here is another easily understand [XmlReader](https://www.dotnetperls.com/xmlreader) example. – 劉鎮瑲 Feb 18 '19 at 08:48
0

I am not experiented .But i think XmlReader is unnecessary. It is very hard to use.
XElement is very easy to use.
If you need performance ( faster ) you must change file format and use StreamReader and StreamWriter classes.

Mehmet
  • 739
  • 1
  • 6
  • 17