List XPath of all nodes in C#

Question

I'm a total newbie with XPath... I was hoping that, given an arbitrary HTML document, I could extract a list of XPath expressions for all nodes. For example:

html
html/head
html/head/title
html/body
html/body/div
html/body/div/p
...

This is an SSCCE to illustrate what I want:

    static void Main(string[] args)
    {
        String html = @"
        <html>
        <head>
            <title>Test</title>
        </head>
        <body>
            <div>
                <p>Test2</p>
            </div>
        </body>
        </html>
        ";

        XmlDocument doc = new XmlDocument();
        doc.LoadXml(html);

        foreach (XmlNode node in doc.ChildNodes)
            ExamineNode(node);

    }

    static void ExamineNode(XmlNode node)
    {
        Console.WriteLine(/* WHAT TO PUT HERE */); // I want to show the path to this node

        foreach (XmlNode childNode in node.ChildNodes)
            ExamineNode(childNode);
    }

I just don't know what attribute to use, or how to compute the path. One method might be to use the node name and build a string while traversing nodes... but I thought there might be a better way. I'm looking for the best way to do this.

Similar questions have been asked here and here, but I'm looking for tips on how to implement this in C# in as simple a manner as possible.

what have you actually tried code wise.. how are you currently extracting the nodes via XPath as you have previously stated..? — MethodMan, Feb 13 '13 at 19:04
“I don't want to understand it, I just want to analyze it.”—huh? — Ondrej Tucny, Feb 13 '13 at 19:04

score 2 · Accepted Answer · edited May 23 '17 at 11:49

I found a somewhat similar question and there was no simple answer like node.Path or something (like I was hoping), so I just went ahead and made a quick and dirty implementation.

Here's the code I ended up going with:

    static void Main(string[] args)
    {
        String html = @"
        <html>
        <head>
            <title>Test</title>
        </head>
        <body>
            <div>
                <p>Test2</p>
            </div>
        </body>
        </html>
        ";

        XmlDocument doc = new XmlDocument();
        doc.LoadXml(html);

        foreach (XmlNode node in doc.ChildNodes)
            ExamineNode(node, "");

        Console.ReadLine();
    }

    static void ExamineNode(XmlNode node, String parentPath)
    {
        String nodePath = parentPath + '/' + node.Name;

        if (!(node is XmlText))
        {
            Console.WriteLine(nodePath); // I want to show the path to this node

            foreach (XmlNode childNode in node.ChildNodes)
                ExamineNode(childNode, nodePath);
        }
    }

It might not be the most efficient (e.g. does not use StringBuilder), but it's simple and up to the required task.

Just hoping someone finds this useful someday.

List XPath of all nodes in C#

1 Answers1