1

First off, I'm not terribly experienced in XML. I know the very basics of reading in and writing it, but for the most part, things like schemas start to make my eyes cross really quickly. If it looks like I'm making incorrect assumptions about how XML works, there's a good chance that I am.

That disclaimer aside, this is a problem I've run into several times without finding an agreeable solution. I have an XML which defines data, including nested entries (to give an example, a file might have a "Power" element which has a child node of "AlternatePowers" which in turn contains "Power" elements). Ideally, I would like to be able to generate a quick set of classes from this XML file to store the data I'm reading in. The general solution I've seen is to use Microsoft's XSD.exe tool to generate an XSD file from the XML file and then use the same tool to convert the schema into classes. The catch is, the tool chokes if there are nested elements. Example:

- A column named 'Power' already belongs to this DataTable: cannot set 
a nested table name to the same name.

Is there a nice simple way to do this? I did a couple of searches for similar questions here, but the only questions I found dealing with generating schemas with nested elements with the same name were unanswered.

Alternately, it's also possible that I am completely misunderstanding how XML and XSD work and it's not possible to have such nesting...

Update

As an example, one of the things I'd like to parse is the XML output of a particular character builder program. Fair warning, this is a bit wordy despite me removing anything but the powers section.

<?xml version="1.0" encoding="ISO-8859-1"?>
<document>
  <product name="Hero Lab" url="http://www.wolflair.com" versionmajor="3" versionminor="7" versionpatch=" " versionbuild="256">Hero Lab® and the Hero Lab logo are Registered Trademarks of LWD Technology, Inc. Free download at http://www.wolflair.com
    Mutants &amp; Masterminds, Second Edition is ©2005-2011 Green Ronin Publishing, LLC. All rights reserved.</product>
  <hero active="yes" name="Pretty Deadly" playername="">
    <size name="Medium"/>
    <powers>
      <power name="Enhanced Trait 16" info="" ranks="16" cost="16" range="" displaylevel="0" summary="Traits: Constitution +6 (18, +4), Dexterity +8 (20, +5), Charisma +2 (12, +1)" active="yes">
        <powerdesc>You have an enhancement to a non-effect trait, such as an ability (including saving throws) or skill (including attack or defense bonus). Since Toughness save cannot be increased on its own,use the Protection effect instead of Enhanced Toughness (see Protection later in this chapter).</powerdesc>
        <descriptors/>
        <elements/>
        <options/>
        <traitmods>
          <traitmod name="Constitution" bonus="+6"/>
          <traitmod name="Dexterity" bonus="+8"/>
          <traitmod name="Charisma" bonus="+2"/>
        </traitmods>
        <flaws/>
        <powerfeats/>
        <powerdrawbacks/>
        <usernotes/>
        <alternatepowers/>
        <chainedpowers/>
        <otherpowers/>
      </power>
      <power name="Sailor Suit (Device 2)" info="" ranks="2" cost="8" range="" displaylevel="0" summary="Hard to lose" active="yes">
        <powerdesc>A device that has one or more powers and can be equipped and un-equipped.</powerdesc>
        <descriptors/>
        <elements/>
        <options/>
        <traitmods/>
        <flaws/>
        <powerfeats/>
        <powerdrawbacks/>
        <usernotes/>
        <alternatepowers/>
        <chainedpowers/>
        <otherpowers>
          <power name="Protection 6" info="+6 Toughness" ranks="6" cost="10" range="" displaylevel="1" summary="+6 Toughness; Impervious [4 ranks only]" active="yes">
            <powerdesc>You're particularly resistant to harm. You gain a bonus on your Toughness saving throws equal to your Protection rank.</powerdesc>
            <descriptors/>
            <elements/>
            <options/>
            <traitmods/>
            <extras>
              <extra name="Impervious" info="" partialranks="2">Your Protection stops some damage completely. If an attack has a damage bonus less than your Protection rank, it inflicts no damage (you automatically succeed on your Toughness saving throw). Penetrating damage (see page 112) ignores this modifier; you must save against it normally.</extra>
            </extras>
            <flaws/>
            <powerfeats/>
            <powerdrawbacks/>
            <usernotes/>
            <alternatepowers/>
            <chainedpowers/>
            <otherpowers/>
          </power>
        </otherpowers>
      </power>
    </powers>
  </hero>
</document>

Yes, there are a number of unnecessary tags in there, but it's an example of the kind of XML that I'd like to be able to plug in and get something reasonable. This XML, when sent into XSD, generates the following error:

- A column named 'traitmods' already belongs to this DataTable: cannot set
a nested table name to the same name.
Chris J
  • 30,688
  • 6
  • 69
  • 111
Sean Duggan
  • 1,105
  • 2
  • 18
  • 48
  • Can you post your XML schema? I just used XSD today to generate C# classes based on the schema file, I had nested elements and it worked fine, so I think you might have a bug. – Dio Jan 12 '12 at 22:09
  • It's the converting the XML to XSD where I have the issue. I will, however, post an excerpted version of the XML which triggers the error. – Sean Duggan Jan 13 '12 at 15:07
  • Step 1) Ditch schemas, and embrace XML. Step 2) Don't shred XML into native data types. Step 3) Create objects that use XML as their underlying data. The methods of the object should use XPath (selectSingleNode / selectNodes) XSLT to access the data. – William Walseth Jan 13 '12 at 15:26
  • @WilliamWalseth I read the words you're saying, but I'm not following. What do you mean by "don't shred XML into native data types"? And honestly, what I do want to do is quickly generate classes, even very rough ones, from the XML. This is not the first project I've run into where I have bare XML which I need to parse and it would be handy to be able to get a rough outline rather than building it all by hand. I have no real stock in schemas other than that it seems to be the only supported way to go from XML to classes. – Sean Duggan Jan 13 '12 at 15:57
  • @Sean I'm talking about an entirely different approach. Think about it this way. Classes have methods that need to operate on data. In the your shredding method, your class parses the document, and stores everything in native data types or collections that are members of the class. In the XML method I'm proposing, you keep the XML as is and store it in a class member variable. Your methods get the data they need, when the need it using XPath (selectNodes, selectSingleNode), and output to a database or browser with XSLT. – William Walseth Jan 13 '12 at 17:40
  • @WilliamWalseth Ah. As much as anything, I'm looking for a generalized solution for reading in XML files into internal data structures, generating those internal data structures prior. I frequently run into situations where the data already exists and the next step is to pull it all in and it would be nice to have something to "rough it in" rather than having to build each solution from scratch. – Sean Duggan Jan 16 '12 at 14:02
  • @Sean, I see. The XML approach requires flexibility on the internal member data structures in your objects. Sounds like you're stuck using native data types. – William Walseth Jan 16 '12 at 16:39

3 Answers3

2

I just finished helping someone with that. Try reading this thread here: https://stackoverflow.com/a/8840309/353147

Taking from your example and my link, you'd have classes like this.

public class Power
{
    XElement self;

    public Power(XElement power) { self = power; }

    public AlternatePowers AlternatePowers
    { get { return new AlternatePowers(self.Element("AlternatePowers")); } }
}

public class AlternatePowers
{
    XElement self;

    public AlternatePowers(XElement power) { self = power; }

    public Power2[] Powers
    { 
        get 
        { 
            return self.Elements("Power").Select(e => new Power2(e)).ToArray();
        }
    }
}

public class Power2
{
    XElement self;

    public Power2(XElement power) { self = power; }
}

Without knowing the rest of your xml, I cannot make the properties that make up each class/node level, but you should get the gist from here and from the link.

You'd then reference it like this:

Power power = new Power(XElement.Load("file"));
foreach(Power2 power2 in power.AlternatePowers.Powers)
{
    ...
}
Community
  • 1
  • 1
Chuck Savage
  • 11,775
  • 6
  • 49
  • 69
  • So if I understand you correctly, I have to define a new class for each nested instance of Power? Power is the same construct in all cases no matter what its location. – Sean Duggan Jan 13 '12 at 15:05
0

Your error message implies that you are trying to generate a DataSet from the schema (/d switch), as opposed to a set of arbitrary classes decorated with XML Serializer attributes (/c switch).

I've not tried generating a DataSet like that myself, but I can see how it might fail. A DataSet is a collection of DataTables, which in turn contain a collection of DataRows. That's a fixed 3-level hierarchy. If your XML schema is more or less than 3 levels deep, then it won't fit into the required structure. Try creating a test DataSet in the designer and examine the generated .xsd file; that will show you what kind of schema structure will fit.

I can assure you from personal experience, if you convert the schema to a set of arbitrary classes instead, then it will handle pretty much any schema structure that you care to throw at it.

Christian Hayter
  • 30,581
  • 6
  • 72
  • 99
  • Hi Christian, your answer is partially correct, which is why I can't up/down-vote it. It is correct in that the switch is using is generating a dataset. If that's fixed, that's all it needs. However, it is totally wrong to state that XSD to DataSet wouldn't work for more than "3 levels deep". I can quickly produce an example with an arbitrary high number of nesting. The issue here is in the assumptions that the tool xsd.exe makes when generating DataTables. In this case the issue is Power element being nested as described - typically, xsd.exe won't handle a scenario as that. – Petru Gardea Jan 13 '12 at 00:47
  • Fair enough, I've not tried it myself so I could only guess at the result. – Christian Hayter Jan 13 '12 at 08:19
0

So, it's not pretty, but the following is what I wound up with as a solution. I run processElement on the base node and then I go through extantElements and export the class code.

namespace XMLToClasses
{
    public class Element
    {
        public string Name;
        public HashSet<string> attributes;
        public HashSet<string> children;

        public bool hasText;

        public Element()
        {
            Name = "";

            attributes = new HashSet<string>();
            children = new HashSet<string>();

            hasText = false;
        }

    public string getSource()
        {
            StringBuilder sourceSB = new StringBuilder();

            sourceSB.AppendLine("[Serializable()]");
            sourceSB.AppendLine("public class cls_" + Name);
            sourceSB.AppendLine("{");

            sourceSB.AppendLine("\t// Attributes" );

            if (hasText)
            {
                sourceSB.AppendLine("\tstring InnerText;");
            }

            foreach(string attribute in attributes)
            {
                sourceSB.AppendLine("\tpublic string atr_" + attribute + ";");
            }
            sourceSB.AppendLine("");
            sourceSB.AppendLine("\t// Children");
            foreach (string child in children)
            {
                sourceSB.AppendLine("\tpublic List<cls_" + child + "> list" + child + ";");
            }

            sourceSB.AppendLine("");
            sourceSB.AppendLine("\t// Constructor");
            sourceSB.AppendLine("\tpublic cls_" + Name + "()");
            sourceSB.AppendLine("\t{");
            foreach (string child in children)
            {
                sourceSB.AppendLine("\t\tlist" + child + " = new List<cls_" + child + ">()" + ";");
            }
            sourceSB.AppendLine("\t}");

            sourceSB.AppendLine("");
            sourceSB.AppendLine("\tpublic cls_" + Name + "(XmlNode xmlNode) : this ()");
            sourceSB.AppendLine("\t{");

            if (hasText)
            {
                sourceSB.AppendLine("\t\t\tInnerText = xmlNode.InnerText;");
                sourceSB.AppendLine("");
            }            

            foreach (string attribute in attributes)
            {
                sourceSB.AppendLine("\t\tif (xmlNode.Attributes[\"" + attribute + "\"] != null)");
                sourceSB.AppendLine("\t\t{");
                sourceSB.AppendLine("\t\t\tatr_" + attribute + " = xmlNode.Attributes[\"" + attribute + "\"].Value;");
                sourceSB.AppendLine("\t\t}");
            }

            sourceSB.AppendLine("");

            foreach (string child in children)
            {
                sourceSB.AppendLine("\t\tforeach (XmlNode childNode in xmlNode.SelectNodes(\"./" + child + "\"))");
                sourceSB.AppendLine("\t\t{");
                sourceSB.AppendLine("\t\t\tlist" + child + ".Add(new cls_" + child + "(childNode));");
                sourceSB.AppendLine("\t\t}");
            }

            sourceSB.AppendLine("\t}");

            sourceSB.Append("}");

            return sourceSB.ToString();
        }
    }

    public class XMLToClasses
    {
        public Hashtable extantElements;

        public XMLToClasses()
        {
            extantElements = new Hashtable();
        }

        public Element processElement(XmlNode xmlNode)
        {
            Element element;

            if (extantElements.Contains(xmlNode.Name))
            {
                element = (Element)extantElements[xmlNode.Name];
            }
            else
            {
                element = new Element();
                element.Name = xmlNode.Name;

                extantElements.Add(element.Name, element);
            }            

            if (xmlNode.Attributes != null)
            {
                foreach (XmlAttribute attribute in xmlNode.Attributes)
                {
                    if (!element.attributes.Contains(attribute.Name))
                    {
                        element.attributes.Add(attribute.Name);
                    }
                }
            }


            if (xmlNode.ChildNodes != null)
            {
                foreach (XmlNode node in xmlNode.ChildNodes)
                {
                    if (node.Name == "#text")
                    {
                        element.hasText = true;
                    }
                    else
                    {
                        Element childNode = processElement(node);

                        if (!element.children.Contains(childNode.Name))
                        {
                            element.children.Add(childNode.Name);
                        }
                    }
                }
            }

            return element;
        }
    }
}

I'm sure there's ways to make this look more pretty or work better, but it's sufficient for me.

Edit: And ugly but functional deserialization code added to take an XMLNode containing the object and decode it.

Later Thoughts: Two years later, I had an opportunity to re-use this code. Not only have I not kept it up to date here (I'd made changes to better normalize the names of the items), but I think that the commenters saying that I was going about this the wrong way were right. I still think this could be a handy way of generating template classes for an XML file where a given type of element could show up at different depths, but it's inflexible (you have to rerun the code and re-extract the classes every time) and doesn't nicely handle changes in versioning (between when I first created this code to allow me to quickly create a character file converter and now, the format changed, so I had people complaining that it stopped working. In retrospect, it would have made more sense to search for the correct elements using XPaths and then pull the data from there).

Still, it was a valuable experience, and I suspect I'm probably going to come back to this code from time to time for quickly roughing out XML data, at least until I find something better.

Sean Duggan
  • 1,105
  • 2
  • 18
  • 48