-1

I need to iterate through some XML that has an unusual format to it. It looks like this:

<Baseball>
    <Player playerID="123" playerName="John Smith" playerBats="Right"/>    
    <position positionID="1b" positionCode="abc" counter="3"/>
    <position positionID="2b" positionCode="def" counter="2"/>
    </Player>
</Baseball>

I cannot change this format that is given to me. I need to iterate through every line and different pieces get pulled and go to different places. I will be doing the code in C#. Ideas? Thank you!

Bradley Uffner
  • 16,641
  • 3
  • 39
  • 76
Mike Smith
  • 145
  • 1
  • 8
  • 2
    Does the "unusual format" include the missing of closing `>` in the `position` `tag` with `ID = 2b` and the missing of early `"` in the `player` named `John Smith`? – Ian May 05 '17 at 02:19
  • 1
    Text sample in the post does not represent valid XML - so if indeed it is representative sample stop calling it "XML" and use basic text scanning tools to try to extract data from it. Otherwise [edit] post to update sample and also show code you've tried to read it. – Alexei Levenkov May 05 '17 at 02:20
  • Thank you both for replying. Made some small edits I overlooked. The document comes to me as a XmlDocument and it does come in this unusual form – Mike Smith May 05 '17 at 02:32
  • 5
    I fail to see anything unusual about this XML. – Bradley Uffner May 05 '17 at 02:41
  • Agreed with @BradleyUffner this XML is not unusual – Brian Ogden May 05 '17 at 02:41
  • You might probably want to look at `XMLDocument` class – Ian May 05 '17 at 02:44
  • Search the internet for "load XML document with XmlDocument class", there are many examples, including on Stack Overflow – Brian Ogden May 05 '17 at 02:45
  • 1
    Possible duplicate of [Iterating through all nodes in XML file](http://stackoverflow.com/questions/2915294/iterating-through-all-nodes-in-xml-file) – Brian Ogden May 05 '17 at 02:46
  • @BradleyUffner - it was, before OP edited it. Now, a simple XML parser can handle it. – J.N. May 05 '17 at 02:46
  • 1
    @J.N. OP states in his comments, after making "some small edits" that XmlDocument remains "in this unusual form" – Brian Ogden May 05 '17 at 02:48
  • @MikeSmith - You should give us the *actual* input, without any manual modifications, so that we can help you make code that can handle the input. Manually correcting the input is likely not the way you want to go, before inputting it to your application. – J.N. May 05 '17 at 02:51
  • Actually, I either see what is "unusual" now, or there was another mistake transcribing the xml to the site. Is the `Player` element supposed to be self closing? If so, that would make it invalid XML, as there is an explicit closing tag for `Player` at the end also. – Bradley Uffner May 05 '17 at 02:54
  • So this is actual code (I truncated 1000s of lines of it) that comes over as a XmlDocument. There's no more typos. The lack of appropriate, normal-looking tags is my problem. – Mike Smith May 05 '17 at 03:13
  • What lack of "appropriate, normal-looking tags"? It's really unclear what you think is unusual about this XML. – Jon Skeet May 05 '17 at 16:27

1 Answers1

2

Assuming your input XML actually IS valid XML, this is the pattern I use for this type of thing.

Your example XML isn't valid because Player is both self closing, and has an explicit closing tag. I've adjusted it to my best guess of what it should look like.

If that really is the XML you have to deal with, XmlDocument.LoadXml will throw an error. You will need to find some other way to deal with the malformed data, perhaps pre-processing the data to remove the / on the Player element so it is no longer self closing.

The basic pattern is that there is a class for each element in the XML. Each class has a static function FromXml that accepts an XmlElement for the matching element in the XML. FromXML is responsible for reading, parsing, and populating its properties from attributes. Child elements are processed by calling FromXml on their associated classes.

    class Program
    {
        static void Main(string[] args)
        {
            string xml =
 @"<Baseball>
    <Player playerID=""123"" playerName=""John Smith"" playerBats=""Right"">    
    <position positionID=""1b"" positionCode=""abc"" counter=""3""/>
    <position positionID=""2b"" positionCode=""def"" counter=""2""/>
    </Player>
</Baseball>";

            var document = new XmlDocument();
            document.LoadXml(xml);

            var players = new List<Player>();

            foreach (XmlElement baseballElement in document.SelectNodes("Baseball"))
            {
                foreach (XmlElement playerElement in baseballElement.SelectNodes("Player"))
                {
                    players.Add(Player.FromXml(playerElement));
                }
            }
            Console.ReadLine();
        }
    }

    public class Player
    {
        public static Player FromXml(XmlElement PlayerElement)
        {
            var player = new Player();
            player.PlayerId = int.Parse(PlayerElement.GetAttribute("playerID"));
            player.PlayerName = PlayerElement.GetAttribute("playerName");
            player.PlayerBats = PlayerElement.GetAttribute("playerBats");

            foreach (XmlElement positionElement in PlayerElement.SelectNodes("position"))
            {
                player.Positions.Add(Position.FromXml(positionElement));
            }
            return player;
        }

        public int PlayerId { get; set; }

        public string PlayerName { get; set; }

        public string PlayerBats { get; set; }

        private List<Position> _positions = new List<Position>();
        public List<Position> Positions
        {
            get { return _positions; }
        }
    }

    public class Position
    {
        public static Position FromXml(XmlElement positionElement)
        {
            var position = new Position();
            position.PositionId = positionElement.GetAttribute("positionID");
            position.PositionCode = positionElement.GetAttribute("positionCode");
            position.Counter = int.Parse(positionElement.GetAttribute("counter"));

            return position;
        }

        public string PositionId { get; set; }
        public string PositionCode { get; set; }
        public int Counter { get; set; }
    }

This will create a list of Player each of which contains a list of Position, all populated from the XML.

I also haven't done any kind of error checking on the input XML. If any attributes are missing or in the wrong format, it will throw an error.

Bradley Uffner
  • 16,641
  • 3
  • 39
  • 76
  • I was pretty excited to see your response. When I first started reading the code there is one line that stood out and it turns out my IDE doesn't much care for it either. It is: public List Positions { get; } = new List(); How do you split this up for C# 5? Thank you! – Mike Smith May 05 '17 at 14:58
  • Change that property to a "normal" read only property with a backing field, and initialize the field in the constructor (which you will have to add also). I'm on mobile now, so I can't update the code. If you would like me to update it once I get to a computer, just let me know. – Bradley Uffner May 05 '17 at 15:21
  • Please do. I've reread what you just wrote 5 times and I'm still trying to parse it. – Mike Smith May 05 '17 at 15:41
  • Code has been updated. I ended up using a field initializer instead of a constructor, making it closer to my original code. – Bradley Uffner May 05 '17 at 16:24
  • Sorry for the delay, ended up with a sick child. This works great for what I'll need it to do. Thank you! – Mike Smith May 07 '17 at 04:13
  • If this answer solved the problem from your question, you should activate the check mark next to my answer to mark it as accepted. – Bradley Uffner May 08 '17 at 13:49