I have an XML file in this format (but only much bigger) :
<customer>
<name>John</name>
<age>24</age>
<gender>M</gender>
</customer>
<customer>
<name>Keith</name>
<age></age> <!--blank value-->
<gender>M</gender>
</customer>
<customer>
<name>Jenny</name>
<age>21</age>
<gender>F</gender>
</customer>
<customer>
<name>John</name>
<age>24</age> <!--blank value-->
<gender>M</gender> <!--blank value-->
</customer>
I want to generate a DataTable which will be in this format :
Element Name Value Frequency
name filled 4
name blank 0
age filled 2
age blank 2
gender filled 3
gender blank 1
Currently I am completing this task in 2 parts, first creating a DataTable structure as above and setting all the frequencies to 0 as default. And then reading the XML using XmlReader, and increasing the count of the frequency everytime XmlReader finds a child element.
My problem is that the second function that I use for adding the actual count is taking too long for very big Xml files with many customers having many attributes. How can I improve the efficiency of this function?
My code :
static void AddCount(DataTable dt)
{
int count;
using (XmlReader reader = XmlReader.Create(@"C:\Usr\sample.xml"))
{
while (reader.Read())
{
if (reader.IsStartElement())
{
string eleName = reader.Name;
DataRow[] foundElements = dt.Select("ElementName = '" + eleName + "'");
if (!reader.IsEmptyElement)
{
count = int.Parse(foundElements.ElementAt(0)["Frequency"].ToString());
foundElements.ElementAt(0).SetField("Frequency", count + 1);
}
else
{
count = int.Parse(foundElements.ElementAt(0)["Frequency"].ToString());
foundElements.ElementAt(0).SetField("Frequency", count + 1);
}
}
}
}
}
I am also ready to change the XmlReader class for any other more efficient class for this task. Any advice is welcome.