Reading XML and creating a frequency count of elements in C#

Question

I have an XML file in this format (but only much bigger) :

<customer>
    <name>John</name>
    <age>24</age>
    <gender>M</gender>
</customer>
<customer>
    <name>Keith</name>
    <age></age>         <!--blank value-->
    <gender>M</gender>
</customer>
<customer>
    <name>Jenny</name>
    <age>21</age>
    <gender>F</gender>
</customer>
<customer>
    <name>John</name>   
    <age>24</age>       <!--blank value-->
    <gender>M</gender>  <!--blank value-->
</customer>

I want to generate a DataTable which will be in this format :

Element Name Value Frequency

name filled 4
name blank 0

age    filled 2
age blank 2

gender    filled 3
gender     blank 1

Currently I am completing this task in 2 parts, first creating a DataTable structure as above and setting all the frequencies to 0 as default. And then reading the XML using XmlReader, and increasing the count of the frequency everytime XmlReader finds a child element.

My problem is that the second function that I use for adding the actual count is taking too long for very big Xml files with many customers having many attributes. How can I improve the efficiency of this function?

My code :

static void AddCount(DataTable dt)
{
     int count;
     using (XmlReader reader = XmlReader.Create(@"C:\Usr\sample.xml"))
     {
         while (reader.Read())
            {
                if (reader.IsStartElement())
                {
                    string eleName = reader.Name;
                    DataRow[] foundElements = dt.Select("ElementName = '" + eleName + "'");  
                    if (!reader.IsEmptyElement)
                    {
                       count = int.Parse(foundElements.ElementAt(0)["Frequency"].ToString());  
                       foundElements.ElementAt(0).SetField("Frequency", count + 1);
                    }
                    else
                    {
                       count = int.Parse(foundElements.ElementAt(0)["Frequency"].ToString());  
                       foundElements.ElementAt(0).SetField("Frequency", count + 1);
                    }
                }
            }   
       }   
  }

I am also ready to change the XmlReader class for any other more efficient class for this task. Any advice is welcome.

score 2 · Accepted Answer · answered Dec 21 '15 at 06:42

It turned out that querying in the DataTable using the Select operation was very expensive and that was making my function very slow.

Instead of that, used a Dictionary<string, ValueFrequencyModel> and queried on that to fill the dictionary with the count, and after completing that, converted the Dictionary<string, ValueFrequencyModel> into a DataTable.

This saved loads of time for me and solved the problem.

score 1 · Answer 2 · answered Dec 15 '15 at 08:21

You can use following code:

    using (XDocument xdoc = XDocument.Load(@"C:\Users\aks\Desktop\sample.xml"))
        {
            var customers = xdoc.Descendants("customer");
            var totalNodes = customers.Count();

            var filledNames = customers.Descendants("name").Where(x => x.Value != string.Empty).Count();
            var filledAges = customers.Descendants("age").Where(x => x.Value != string.Empty).Count();
            var filledGenders = customers.Descendants("gender").Where(x => x.Value != string.Empty).Count();

            var unfilledNames = totalNodes - filledNames;
            var unfilledAges = totalNodes - filledAges;
            var unfilledGenders = totalNodes - filledGenders;
        }

I thought of using XDocument, but the problem with this is that it loads the whole XML file on memory, and as I said, I am dealing with very large XML files so I wouldn't prefer to do that. [Read this](http://stackoverflow.com/questions/8096564/xmltextreader-vs-xdocument). Although thanks for this help... will use this as my last option. — Aamir Jamal, Dec 15 '15 at 08:46

score 0 · Answer 3 · answered Dec 15 '15 at 08:03

0

Try this logic, currently I have only taken only one attribute here ie Name

        XDocument xl = XDocument.Load(@"C:\Usr\sample.xml");
        var customers = xl.Descendants("Customer");
        var customerCount = customers.Count();
        var filledCustomers = customers.Where(x => x.Element("Name").Value != string.Empty).Count();
        var nonfilledCustomers = customerCount - filledCustomers;

answered Dec 15 '15 at 08:03

cpr43

2,942
1
18
18

I thought of using XDocument, but the problem with this is that it loads the whole XML file on memory, and as I said, I am dealing with very large XML files so I wouldn't prefer to do that. [Read this](http://stackoverflow.com/questions/8096564/xmltextreader-vs-xdocument). Although thanks for this help... will use this as my last option. – Aamir Jamal Dec 15 '15 at 08:37

Reading XML and creating a frequency count of elements in C#

3 Answers3