Reading logic fails when encountered XML element with empty vaue

Question

I had this code for reading XML file:

using (XmlReader reader = XmlReader.Create(path + "\\AddressBook\\settings.xml"))
{
    while (reader.Read())
    {
        if (reader.IsStartElement())
        {
            if (reader.Name == "Person")
            {
                Person p = new Person();

                reader.Read();
                reader.Read();

                // Get name
                p.name = reader.Value;    
                reader.Read(); // skip end tag of previous element
                reader.Read(); // read start tag of email
                reader.Read(); // read email value

                // Get email
                p.email = reader.Value;    
                reader.Read();
                reader.Read();
                reader.Read();

                // Get first name
                p.address = reader.Value;
                reader.Read();
                reader.Read();
                reader.Read();

                // Get notes
                p.notes = reader.Value;    
                reader.Read();
                reader.Read();
                reader.Read();

                // Get dob
                p.dob = DateTime.Parse(reader.Value);

                lPeople.Add(p);
            }
        }
    }
}

Thing is it is working but failed on this XML element:

<Person>
    <Name /> 
    <Email /> 
    <StreetAddress /> 
    <Notes>note</Notes> 
    <DateOfBirth>5/20/2015 8:04:15 PM</DateOfBirth> 
</Person>

It failed because you can see when I added empty elements as name, email and address no start and end tags where added (by XMLWriter), rather just <Name/>, hence my reading logic above failed by reading wrong elements. What is the workaround about this usually?

note: Answer from here using extension method doesn't work for me for some reasons, still no start and end tags are written for empty elements

update: The answer by Jon Hanna below is not working for me. What happens is if I have read some element say "Email" as soon as ReadElementContentAsString is called, it seems it jumps on other element like "Notes". Btw this is link with the XML on which it fails J. Hanna's approach

score 0 · Answer 1 · edited May 23 '17 at 11:44

0

A simple null check would do.

if(reader.value == null) 
    p.name = "";
else
    p.name = reader.value;

That being said this is not an ideal way of reading XML, you probably want to use LINQ to XML which allow for SQL query-like operations to retrieve data, or System.Xml.XmlDocument which is older than LINQ to XML but still in use as well. Here is a tutorial for that.

This SO Question also mentions how to use XMLDocuments to read an XML, you could use that approach as mentioned to retrieve data from XML to a person class, a null value is given if there is no value available. That way you do not have to hardcode your approach by manually doing reader.Read(), which your approach will also fail if the XML is slightly different with less/more elements.

This tutorial also goes over the basics of LINQ to XML with examples very similar to what you're trying to achieve.

EDIT after OP's comments

If this naive approach is the way you want to proceed, chances are to conform to some requirement like in an assignment, then just keep doing what you're doing but add a switch case to check what node you're dealing with.

if (reader.Value == null)
   continue;

switch(reader.Name)
{
   case "Email":
     person.Email = reader.Value;
     break;
   case "Address":
     person.Address = reader.Value;
     break;
}

edited May 23 '17 at 11:44

Community

1
1

answered May 20 '15 at 16:22

matrixanomaly

6,627
2
35
58

this will not work because things are getting corrupted I may get street address at email address - because I have one read always to skip end tag – May 20 '15 at 16:27
Please refer to the links I have provided regarding reading XML documents, your approach is hardcoded and you end up skipping elements if the order is not exactly the way you want it. LINQ to XML or XMLDocuments are the way to go. Then you can do a null check. – matrixanomaly May 20 '15 at 16:29
Thanks but what you did is not help - I asked solution in my situation. I know I can use other approach but I don't want other approach. So if you can help me in this situation please, otherwise your answer is not helpful – May 20 '15 at 16:31
you still need to understand why your approach is not right, Jon Hanna's answer should work in your situation. But you need to understand that hardcoding is not the way to go. – matrixanomaly May 20 '15 at 16:43
That approach(by Jon Hanna) is not working yet I posted sample – May 20 '15 at 17:09

score 0 · Answer 2 · answered May 20 '15 at 16:32

It's not a matter of "workaround". You've written a parser that only accepts a given format, and have then passed in XML in another format.

For a start, if you can get empty elements, it would make sense to check reader.IsEmptyElement.

You'd still be expecting a certain number of text nodes between given elements though, which isn't very resilient.

I'd prefer something like:

using (var reader = XmlReader.Create(path + @"\AddressBook\settings.xml"))
  while (reader.Read())
    if (reader.IsStartElement() && reader.Name == "Person")
      people.Add(ReadPerson(reader.ReadSubtree()));

Where then ReadPerson() lets me worry about just that element at that point:

private static Person ReadPerson(XmlReader rdr)
{
  var person = new Person();
  using(rdr)
    while(rdr.Read())
      if(rdr.IsStartElement())
        switch(rdr.Name)
        {
          case "Name":
            person.Name = rdr.ReadElementContentAsString();
            break;
          case "Email":
            person.Email =  rdr.ReadElementContentAsString();
            break;
          case "StreetAddress":
            person.Address = rdr.ReadElementContentAsString();
            break;
          case "Notes":
            person.Notes = rdr.ReadElementContentAsString();
            break;
          case "DateOfBirth":
            person.DateOfBirth = DateTime.Parse(rdr.ReadElementContentAsString(), CultureInfo.GetCultureInfo("en-US"));
            break;
        }
  return person;
}

(Though if I knew who was responsible for this XML I would beg them to replace the date time from one intended for human beings raised in the US to one intended for XML parsing software: 2015-05-20T20:04:15.

This is resilient in the face of re-ordering of elements, and different amounts of whitespace between elements, as well as empty elements.

I'd also certainly consider replacing people.Add(ReadPerson(reader.ReadSubtree())) with yield ReadPerson(reader.ReadSubtree()). Then the calling code can create it's own list if it needs the list, but otherwise can use the Person objects as they come.

this does not seem to work actually when I called this reading method it somehow replaced value of "notes" element with empty value ?! (ok reading part didn't work, I know why empty value was written -because I saved again the empty element as it read it - but like I said, reading is not working) — , May 20 '15 at 16:45
this reader is definitely not working it is broken, sorry :(( ? — , May 20 '15 at 16:50
With that file (fixed to be XML) I get a single `Person` object, with `Notes` set to `"joni"` and `DateOfBirth` set to 20/05/2015 20:44:31 (very precise date of birth!). Could you put the real XML somewhere, as the codepad introduces nonsense `-` marks and an invalid offset before the XML declaration, so just to be sure that isn't the issue. — Jon Hanna, May 20 '15 at 16:56
Sure, thanks.Here it is : http://s000.tinyupload.com/?file_id=05275955763358844619 — , May 20 '15 at 16:58
this approach is not working with me I added some explanation in my original question as to why — , May 21 '15 at 08:20

Reading logic fails when encountered XML element with empty vaue

2 Answers2