XPath performance in XML reading

Question

I wrote a class in WinForms to retrieve some data from XML files. My XML files are mostly large and around 5 to 10MB. I am not satisfied with the performance of my code as it sometimes takes processing like forever! So I want you to check my code and correct me if I am wrong.

The code below is a sample of this class which loads a set of points for drawing a curve:

class TestXML
{
    // Class initializings
    XmlDocument ztr = new XmlDocument();
    XPathDocument doc;
    XPathNavigator nav;
    XmlNamespaceManager ns;

    string filePath;

    public TestXML(string pathToFile)
    {
        this.filePath = pathToFile;

        ztr.Load(filePath);
        doc = new XPathDocument(filePath);
        nav = doc.CreateNavigator();
        ns = new XmlNamespaceManager(nav.NameTable);
    }

    public double[,] GetCurveDataPost(string testType, string groupName, string subTestType, string pinName, string testNameContains = "Post VI")
    {
        List<double> voltage = new List<double>();
        List<double> current = new List<double>();

        XPathNodeIterator volt = nav.Select("/Document/Tests/Test[contains(Name, '" + testNameContains + "') and Type='" + testType + "']/Groups/Group[Name='" + groupName + "']/CurvesFileData/Pins/Pin[Number='" + pinName + "']/Curves//Curve/VIPairs/VIPair/Voltage");
        XPathNodeIterator curr = nav.Select("/Document/Tests/Test[contains(Name, '" + testNameContains + "') and Type='" + testType + "']/Groups/Group[Name='" + groupName + "']/CurvesFileData/Pins/Pin[Number='" + pinName + "']/Curves//Curve/VIPairs/VIPair/Current");

        foreach (XPathNavigator value in volt)
        {
            voltage.Add(Convert.ToDouble(value.Value));
        }

        foreach (XPathNavigator value in curr)
        {
            current.Add(Convert.ToDouble(value.Value));
        }

        double[,] data = new double[voltage.Count(), 2];
        for (int i = 0; i < voltage.Count(); i++)
        {
            data[i, 0] = voltage[i];
            data[i, 1] = current[i];
        }

        return data;
    }
}

I can load multiple XML files using this class (For example inside a TreeVIew and each Name property of nodes will be the path to XML files). But it is not at all time efficient. Is there a workaround for making it faster? Could it be like loading XML files inside memory first and later do operation? but this can lead to high memory dependency.

Please let me know if you need more information.

score 2 · Accepted Answer · edited May 23 '17 at 12:09

You should profile your code to identify which parts of it are taking a long time.

This can be as simple as a fre Debug.WriteLine("MyMarker " + DateTime.Now) (or similar) statements dotted around the code, or you can use a Profiling tool.

That said, the slowdown is likely on this line:

ztr.Load(filePath); // ztr is a XmlDocument

The XmlDocument class reads and parses the whole XML file in order to load the document, even if the information you desire is actually near the start of the file - for large files this can be relatively inefficient.

You shoul look into using the XmlReader class to read your xml document instead - it allows you to read the document element by element on an as-needed basis and so can be considerably quicker for reading data from large XML documents. The trade-off however is that it is more difficult to use than the XmlDocument class

XPath performance in XML reading

1 Answers1