21

I'm seeing strange behavior when I try to parse XML using the LINQ XmlReader class. Test case below: it looks like whether I use (XElement)XNode.ReadFrom(xmlReader) or one of the Read() methods on XmlReader, it misses the second bar elements in the input XML. If any whitespace is added between the </bar> and <bar> then it will parse the second bar element correctly.

Does anyone have an idea of why the input stream gets messed up and how to get around this problem?

    [Test]
    [Explicit]
    public void ShouldParseCorrectNumberOfElements()
    {
        var xml = @"<foo><bar>wtf</bar><bar>wtf2</bar></foo>";
        XmlReader xmlReader = XmlReader.Create(new MemoryStream(Encoding.UTF8.GetBytes(xml)));

        int count = 0;
        xmlReader.MoveToContent();
        while (xmlReader.Read())
        {
            if (xmlReader.NodeType == XmlNodeType.Element && xmlReader.Name == "bar")
            {
                var element = xmlReader.ReadOuterXml();
                Console.WriteLine("just got an " + element);
                count++;
            }
        }
        Assert.AreEqual(2, count);
    }
Joe Smith
  • 213
  • 1
  • 2
  • 4
  • The loop can be significantly optimized by using `ReadToFollowing("bar")` instead of `Read()` (works with Jon's answer too). – Dmitry Fedorkov Apr 22 '14 at 16:34
  • I have a similar case and i'm using `ReadToFollowing` with `While` and `ReadOuterXml` inside the while loop. If the document is formatted with newlines it is functioning properly. When I have a single line document it skips all of the following nodes. – Mert Gülsoy Jan 12 '15 at 15:24

2 Answers2

35

You're calling ReadOuterXml, which will consume the element and place the "cursor" just before the next element. You're then calling Read again, which moves the cursor on (e.g. to the text node within the element).

Here's an alternative to your loop:

while (!xmlReader.EOF)
{
    Console.WriteLine(xmlReader.NodeType);
    if (xmlReader.NodeType == XmlNodeType.Element && xmlReader.Name == "bar")
    {
        var element = xmlReader.ReadOuterXml();
        Console.WriteLine("just got an " + element);
        count++;                
    }
    else
    {
        xmlReader.Read();
    }
}
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
6

Are you perhaps skipping a line by calling the Read() function within the while loop condition and then the ReadOuterXml() function within the loop itself?

Aaron
  • 7,431
  • 12
  • 35
  • 37