2

I have a quite elaborate XML I have been able to parse most of it however im coming across a tree that just has me stumped and im afraid that I'm making harder then it needs to be. here is the XML I'm referring to.

<Codes>
            <CustomFieldValueSet name="Account" label="Account" distributionType="PercentOfPrice">
                <CustomFieldValue distributionValue="10.00" splitindex="0">
                    <Value>7200</Value>
                    <Description>General Supplies</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="1">
                    <Value>7200</Value>
                    <Description>General Supplies</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="2">
                    <Value>7200</Value>
                    <Description>General Supplies</Description>
                </CustomFieldValue>
            </CustomFieldValueSet>
            <CustomFieldValueSet name="Activity" label="Activity" distributionType="PercentOfPrice" />
            <CustomFieldValueSet name="Chart" label="Chart" distributionType="PercentOfPrice">
                <CustomFieldValue distributionValue="10.00" splitindex="0">
                    <Value>T</Value>
                    <Description>University</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="1">
                    <Value>T</Value>
                    <Description>University</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="2">
                    <Value>T</Value>
                    <Description>University</Description>
                </CustomFieldValue>
            </CustomFieldValueSet>
            <CustomFieldValueSet name="Fund" label="Fund" distributionType="PercentOfPrice">
                <CustomFieldValue distributionValue="10.00" splitindex="0">
                    <Value>360806</Value>
                    <Description>National Institutes of Health</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="1">
                    <Value>360903</Value>
                    <Description>National  Institutes of Health</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="2">
                    <Value>360957</Value>
                    <Description>National Institutes of Health</Description>
                </CustomFieldValue>
            </CustomFieldValueSet>
            <CustomFieldValueSet name="Program" label="Program" distributionType="PercentOfPrice">
                <CustomFieldValue distributionValue="10.00" splitindex="0">
                    <Value>02</Value>
                    <Description>Research</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="1">
                    <Value>02</Value>
                    <Description>Research</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="2">
                    <Value>02</Value>
                    <Description>Research</Description>
                </CustomFieldValue>
            </CustomFieldValueSet>
            <CustomFieldValueSet name="Location" label="Location" distributionType="PercentOfPrice">
                <CustomFieldValue distributionValue="10.00" splitindex="0">
                    <Value>015</Value>
                    <Description>Biology - Life Science</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="1">
                    <Value>015</Value>
                    <Description>Biology - Life Science</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="2">
                    <Value>015</Value>
                    <Description>Biology - Life Science</Description>
                </CustomFieldValue>
            </CustomFieldValueSet>
            <CustomFieldValueSet name="Organization" label="Organization" distributionType="PercentOfPrice">
                <CustomFieldValue distributionValue="10.00" splitindex="0">
                    <Value>04400</Value>
                    <Description>TUSM:Neuroscience</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="1">
                    <Value>04400</Value>
                    <Description>TUSM:Neuroscience</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="2">
                    <Value>04400</Value>
                    <Description>TUSM:Neuroscience</Description>
                </CustomFieldValue>
            </CustomFieldValueSet>
        </Codes>

I'm trying to end up with a list the would look something like this.

Account distributionType   Activity   distributionValue  Fund
7200     PercentOfPrice     ""        10                 360806
7200     PercentOfPrice     ""        45                 360903
7200     PercentOfPrice     ""        45                 360957

etc...

I have written code the looks something like this. Here is a snippet. Mind you I think i have over complicated this.

if (tagName == "Codes")
                                {
                                  // Create another reader that contains just the accounting elements.
                                    XmlReader inner = reader.ReadSubtree();
                                    //inner.ReadToDescendant("Codes");
                                    //printOutXML(inner);
                                    while (inner.Read())
                                    {
                                        switch (inner.NodeType)
                                        {       
                                            //walk down the xml hiearchy then simply  fill in the values.
                                            case XmlNodeType.Element:

                                                switch (reader.Name)
                                                {
                                                    case "CustomFieldValueSet":
                                                       //get the attribute that we are currently working with such as account and  
                                                        innerTagName=inner.GetAttribute("name");

                                                        // activity and location can potentially be blank therefore i will check here if it is 
                                                        //and if it is i will immediate assign the activity list a set of empty quotes.
                                                        if (innerTagName == "Activity")
                                                        {
                                                            if (inner.IsEmptyElement)
                                                            {   //quickly put fillers in .
                                                                for (int i = 0; i < thisInvoice.account.Count; i++)
                                                                {
                                                                    thisInvoice.activity.Add("");
                                                                }
                                                            }         
                                                        }

                                                        if (innerTagName == "Location")
                                                        {
                                                            if (inner.IsEmptyElement)
                                                            {   //quickly put fillers in .
                                                                for (int i = 0; i < thisInvoice.account.Count; i++)
                                                                {
                                                                    thisInvoice.location.Add("");
                                                                }
                                                                //thisInvoice.activity.Add("");
                                                            }
                                                        }

                                                        if (null == inner.GetAttribute("distributionType"))
                                                        {
                                                            distType = null;
                                                        }
                                                       else if
                                                       (distributionSwitch == false)
                                                        {
                                                            thisInvoice.distributionType.Add(inner.GetAttribute("distributionType") ?? "");
                                                            distType = inner.GetAttribute("distributionType") ?? "";
                                                       }
                                                        //Console.WriteLine(inner.Value);
                                                        //Console.WriteLine(inner.Name);
                                                        break;

                                                    case "CustomFieldValue":
                                                        if(null == inner.GetAttribute("distributionValue"))
                                                        //thisInvoice.distributionValue.Add(inner.GetAttribute("distributionValue") ?? "");
                                                        {/*do nothing*/}
                                                    else if
                                                        (distributionSwitch == false)
                                                        {
                                                            thisInvoice.distributionValue.Add(inner.GetAttribute("distributionValue") ?? "");
                                                        }
                                                        //check the length of the current distribution  if the lenght is less than the curren distribution value
                                                       // then we must then add the values to the new location.
                                                        if (thisInvoice.distributionValue.Count > thisInvoice.distributionType.Count)
                                                        {
                                                            for (int i = 0; i < thisInvoice.distributionValue.Count - thisInvoice.distributionType.Count; i++)
                                                            {
                                                                thisInvoice.distributionType.Add(distType);
                                                            }



                                                        }

                                                        break;

                                                    case "Value":
                                                         // XmlNodeType.Text
                                                        if (innerTagName == "Account"/*&& inner.NodeType ==XmlNodeType.Text*/)
                                                        {
                                                            inner.MoveToContent();// move to the text 
                                                            inner.Read();
                                                            thisInvoice.account.Add(inner.Value);
                                                        }


                                                        if (innerTagName == "Activity")
                                                        {
                                                            // activitiy is not a mandartory field so it could be empty therefore we need 
                                                            // to check if its  a self closing tag and if it is then we need to assign and 
                                                            if (inner.IsEmptyElement)
                                                            {
                                                                thisInvoice.activity.Add("");
                                                            }
                                                            else
                                                            {
                                                                inner.MoveToContent();// move to the text 
                                                                inner.Read();
                                                                thisInvoice.activity.Add(inner.Value);
                                                            }
                                                        }

                                                        if (innerTagName == "Location")
                                                        {
                                                            if (inner.IsEmptyElement)
                                                            {
                                                                thisInvoice.location.Add("");
                                                            }
                                                            else
                                                            {
                                                                inner.MoveToContent();// move to the text 
                                                                inner.Read();
                                                                thisInvoice.location.Add(inner.Value);
                                                            }
                                                        }

                                                        if (innerTagName == "Fund")
                                                        {
                                                            inner.MoveToContent();// move to the text 
                                                            inner.Read();
                                                            thisInvoice.fund.Add(inner.Value);
                                                        }

                                                        if (innerTagName == "Organization")
                                                        {
                                                            inner.MoveToContent();// move to the text 
                                                            inner.Read();
                                                            thisInvoice.org.Add(inner.Value);
                                                        }

                                                        if (innerTagName == "Program")
                                                        {
                                                            inner.MoveToContent();// move to the text 
                                                            inner.Read();
                                                            thisInvoice.prog.Add(inner.Value);
                                                        }

                                                       break;



                                                }//end switch
                                                break;//brake the outside case.
                                            case XmlNodeType.EndElement:
                                                if (inner.Name == "CustomFieldValueSet" || inner.Value == "CustomFieldValueSet")
                                                {
                                                    distributionSwitch = true;
                                                    Console.WriteLine(reader.Value);
                                                    Console.WriteLine(reader.Name);
                                                }
                                                if (inner.Name == "Codes")
                                                {
                                                    distributionSwitch = false;
                                                    distType = null;
                                                    inner.Close();
                                                }

                                                break;
                                        }//end switch
                                    }//end while
                                }//end the if;

In the case of the tag distributionType i need to make the list length as long as the list for account so in other words once i have it on a variable i need to use it as a filler to make the distribution type list as big as the account list. I cant imagine that there is not an easier way to do this I keep looking at linq to xml but it does not make much sense. I would love to hear how some of you experts would tackle this one. I'm trying to put together an elegant solution with a little less code. Any help would be greatly appreciated.

Miguel
  • 2,019
  • 4
  • 29
  • 53
  • As a first question, why did you not go the route of deserializing the XML into classes, but parsing the XML yourself? – Bernd Linde Oct 29 '14 at 13:18
  • bernd im a noob with working with xml. I get an xml file and i open and process the file.Eventually I have to make a list of arrays to insert into DB tables. deserializing the xml might be the best option but at this point i don't know any better. – Miguel Oct 29 '14 at 13:22
  • 1
    Give me a bit and I will type up something for you. In the meantime have a look at the answers and link in [this post](http://stackoverflow.com/questions/26257041) – Bernd Linde Oct 29 '14 at 13:26
  • Thanks Bernd I'll look at the post very closely. – Miguel Oct 29 '14 at 13:28
  • @BerndLinde, awesome link!! – John Bustos Oct 29 '14 at 13:45
  • 1
    I could be DEAD wrong (hence just posting it as a comment), but I'm seeing that your XML has the following structure: `CustomFieldValueSet` with a `name` and `CustomFieldValue` children with a unique `splitindex` and data for scraping. Why not parse each `CustomFieldValueSet` the same way and add it's scraped data to a list of objects based upon the `splitindex` value of the children? It will automatically have an account associated with it and get all the other corresponding information... Simply put, `Account` should not be your "primary key", but, rather, `splitindex` should be.... – John Bustos Oct 29 '14 at 14:05
  • John that's another great idea however just one caveat split index and the distribution type are not always there.This is an xml im recieving from an external vendor so i dont have the option to change it. – Miguel Oct 29 '14 at 14:13
  • By "not always there" do you mean not in all keys or sometimes just not in the XML even for something like `account`? And, if so, what's their logic? (These XMLs are created by a computer, so the logic is usually pretty standardized) Is that only if there is no split? I still believe this could be made to work... – John Bustos Oct 29 '14 at 14:14
  • john what i mean is that those attributes are not always there in the xml. They only show up if and only if this item has been split otherwise it does not show up at all so that's why i chose account to key off. – Miguel Oct 29 '14 at 14:21
  • Here is another link explaining almost all the different [methods of parsing an XML file](http://stackoverflow.com/questions/55828) – Bernd Linde Oct 29 '14 at 14:36

2 Answers2

2

You can use Linq to XML for this.

using System.Xml;
using System.Xml.Linq;

static void Main(string[] args) {

// This txt file contains your xml.
var xml_sample = File.ReadAllText("xml_sample.txt");
var doc = XDocument.Parse(xml_sample);

// Get all <CustomFieldValueSet> that have the label attribute `Account`
var accounts = from item in doc.Descendants("Codes").Descendants("CustomFieldValueSet")
               where (item.HasAttributes) && 
                     (item.Attribute("label").Value == "Account")
               select item;

// Create an anonymous type containing the value of the 
// distributionValue attribute and the <Value> node.
var accountValue = from el in accounts.Descendants("CustomFieldValue")
                   let distAttribute = el.Attribute("distributionValue")
                   select new
                   {
                       distValue = distAttribute != null ? distAttribute.Value : "0",
                       value = el.Descendants("Value").First().Value,
                   };

// Display stuff here just to make sure we got it right.
accounts.ToList().ForEach(el => 
    Console.WriteLine(el.Name + " " + el.Attribute("distributionType").Value));

accountValue.ToList().ForEach(el => 
    Console.WriteLine(el.distValue + ":"+ el.value));
}

You should be able to use these ideas to parse your XML file as needed.

mihai
  • 4,592
  • 3
  • 29
  • 42
  • thanks Mihai I will also gives this a try i appreciate the response. – Miguel Oct 29 '14 at 13:56
  • Mihai just one more question distribution type is not always part of the xml in which case i would get an error. is there a way to go around this. Sorry i should have mentioned this on my initial question. – Miguel Oct 29 '14 at 14:10
  • You can check if the attribute exists before accessing it: [XML parse check if attribute exist](http://stackoverflow.com/questions/13342143/xml-parse-check-if-attribute-exist), for example. I'm sure some other methods exist. – mihai Oct 29 '14 at 14:13
  • yes i understand that but I'm trying to do it within the link query i suppose, so whenever i run this line of code ` Console.WriteLine(el.distValue + ":"+ el.value));` I'm getting an object not set to a reference error. – Miguel Oct 29 '14 at 14:19
  • Miguel, please check the updated answer. As I've said, you can check if the attribute exists. Here, I've defined a new variable `distAttribute` that is checked in the `select new` clause. You should replace the "0" string with whatever value you feel it should take when the `distributionValue` attribute is missing from your xml – mihai Oct 29 '14 at 14:55
2

As specified in the comments section, an alternative to Mihai's solution of using LINQ to XML, you can also use a pre-defined class structure to deserialize your XML into typed classes and properties.

The benefit of this is that you will then have an object that is a representation of your XML (well hopefully) and allow you to more easily work with the data that was inside the XML

With the supplied XML sample and using the Edit -> Paste Special -> Paste XML as Classes menu option in Visual Studio, you will get a class structure similar to the one below (this one has been refined a bit for easier reading)

using System.Xml.Serialization;

[XmlTypeAttribute(AnonymousType = true)]
[XmlRootAttribute(Namespace = "", IsNullable = false)]
public partial class Codes
{
  [XmlElementAttribute("CustomFieldValueSet")]
  public List<CodesCustomFieldValueSet> CustomFieldValueSet { get; set; }
}

[XmlTypeAttribute(AnonymousType = true)]
public partial class CodesCustomFieldValueSet
{
  [XmlElementAttribute("CustomFieldValue")]
  public List<CodesCustomFieldValueSetCustomFieldValue> CustomFieldValue { get; set; }

  [XmlAttributeAttribute(AttributeName="name")]
  public string Name { get; set; }

  [XmlAttributeAttribute(AttributeName = "label")]
  public string Label { get; set; }

  [XmlAttributeAttribute(AttributeName = "distributionType")]
  public string DistributionType { get; set; }
}

[XmlTypeAttribute(AnonymousType = true)]
public partial class CodesCustomFieldValueSetCustomFieldValue
{
  public string Value { get; set; }

  public string Description { get; set; }

  [XmlAttributeAttribute(AttributeName = "distributionValue")]
  public decimal DistributionValue { get; set; }

  [XmlAttributeAttribute(AttributeName = "splitindex")]
  public byte SplitIndex { get; set; }
}

With this class structure, you are then able to deserialize your XML with the below lines
(where txtInput.Text is a TextBox I used to hold the sample XML data)

XmlSerializer serializer = new XmlSerializer(typeof(Codes));
Codes codesInput = serializer.Deserialize(new StringReader(txtInput.Text)) as Codes;

if (codesInput != null)
{
  // Do something with the data
}

NOTE:
From your desired output and the structure of the sample XML you supplied, there will be a requirement for you to transform the information in the deserialized object into what/how you want it, for that I would recommend creating an additional class structure, combined with a List<T>, to hold all the information as shown in your desired output.

Even better would be if you controlled the XML's structure and could structure it in a better way as to make it more self explanatory than what it currently is, as it seems that the links between each CustomFieldValueSet is the splitindex, which is an attribute of the child nodes, which complicates it a lot.

Further reading on XML Serialization:
MSDN: Introducing XML Serialization
The XmlSerializer Class

Community
  • 1
  • 1
Bernd Linde
  • 2,098
  • 2
  • 16
  • 22
  • I like your solution. I've tested it and it works nice. So +1 from me. – mihai Oct 29 '14 at 15:23
  • Thanks, the only issue it might have is that to use it with LINQ is a bit harder than straight out from the XML, but still possible with the List< >, I think – Bernd Linde Oct 29 '14 at 15:25