I have a program that goes through thousands of files and has to check if they have the correct xml-format. The problem is that it takes ages to complete, and I think that's because of the type of xml reader I use.
In the Method below are 3 different versions which I tried, the first one is the fastest, but only by 5%. (the method does not need to check if the file is a xml)
private bool HasCorrectXmlFormat(string filePath)
{
try
{
//-Version 1----------------------------------------------------------------------------------------
XmlReader reader = XmlReader.Create(filePath, new XmlReaderSettings() { IgnoreComments = true, IgnoreWhitespace = true });
string[] elementNames = new string[] { "DocumentElement", "Protocol", "DateTime", "Item", "Value" };
int i = 0;
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element)
{
if (reader.Name != elementNames.ElementAt(i))
{
return false;
}
if (i >= 4)
{
return true;
}
i++;
}
}
return false;
//--------------------------------------------------------------------------------------------------
//- Version 2 ------------------------------------------------------------------------------------
IEnumerable<XElement> xmlElements = XDocument.Load(filePath).Descendants();
string[] elementNames = new string[] { "DocumentElement", "Protocol", "DateTime", "Item", "Value" };
for (int i = 0; i < 5; i++)
{
if (xmlElements.ElementAt(i).Name != elementNames.ElementAt(i))
{
return false;
}
}
return true;
//--------------------------------------------------------------------------------------------------
//- Version 3 ------------------------------------------------------------------------------------
XDocument doc = XDocument.Load(filePath);
if (doc.Root.Name != "DocumentElement")
{
return false;
}
if (doc.Root.Elements().First().Name != "Protocol")
{
return false;
}
if (doc.Root.Elements().First().Elements().ElementAt(0).Name != "DateTime")
{
return false;
}
if (doc.Root.Elements().First().Elements().ElementAt(1).Name != "Item")
{
return false;
}
if (doc.Root.Elements().First().Elements().ElementAt(2).Name != "Value")
{
return false;
}
return true;
//--------------------------------------------------------------------------------------------------
}
catch (Exception)
{
return false;
}
}
What I need is a faster way to do this. Is there a faster way to go through a xml file? I only have to check if the first 5 Elements have the correct names.
UPDATE
The Xml-Files are only 2-5 KBs in size, rarely more than that. Files are located on a local server. I am on a laptop which has a ssd.
Here are some test results:
I should also add that the files are filtered before, so only xml files are given to the method. I get the files with the following Method:
public List<FileInfo> GetCompatibleFiles()
{
return new DirectoryInfo(folderPath)
.EnumerateFiles("*", searchOption)
.AsParallel()
.Where(file => file.Extension == ".xml" ? HasCorrectXmlFormat(file.FullName) : false)
.ToList();
}
This Method is not in my code like this (it put two methods together), this is just to show how the HasCorrectXmlFormat Method is called. You dont have to correct this Method, I know it can be improved.
UDPATE 2
Here are the two full methods mentioned at the end of update 1:
private void WriteAllFilesInList()
{
allFiles = new DirectoryInfo(folderPath)
.EnumerateFiles("*", searchOption)
.AsParallel()
.ToList();
}
private void WriteCompatibleFilesInList()
{
compatibleFiles = allFiles
.Where(file => file.Extension == ".xml" ? HasCorrectXmlFormat(file.FullName) : false)
.ToList();
}
Both methods are only called once in the entire program (if either the allFiles
or compatibleFiles
List is null).
UPDATE 3
It seems like the WriteAllFilesInList
Method is the real problem here, shown here:
FINAL UPDATE
As it seems, my method doesn't need any improvement as the bottleneck is something else.