0

I'm a total newbie with XPath... I was hoping that, given an arbitrary HTML document, I could extract a list of XPath expressions for all nodes. For example:

html
html/head
html/head/title
html/body
html/body/div
html/body/div/p
...

This is an SSCCE to illustrate what I want:

    static void Main(string[] args)
    {
        String html = @"
        <html>
        <head>
            <title>Test</title>
        </head>
        <body>
            <div>
                <p>Test2</p>
            </div>
        </body>
        </html>
        ";

        XmlDocument doc = new XmlDocument();
        doc.LoadXml(html);

        foreach (XmlNode node in doc.ChildNodes)
            ExamineNode(node);

    }

    static void ExamineNode(XmlNode node)
    {
        Console.WriteLine(/* WHAT TO PUT HERE */); // I want to show the path to this node

        foreach (XmlNode childNode in node.ChildNodes)
            ExamineNode(childNode);
    }

I just don't know what attribute to use, or how to compute the path. One method might be to use the node name and build a string while traversing nodes... but I thought there might be a better way. I'm looking for the best way to do this.

Similar questions have been asked here and here, but I'm looking for tips on how to implement this in C# in as simple a manner as possible.

Community
  • 1
  • 1
Gigi
  • 28,163
  • 29
  • 106
  • 188

1 Answers1

2

I found a somewhat similar question and there was no simple answer like node.Path or something (like I was hoping), so I just went ahead and made a quick and dirty implementation.

Here's the code I ended up going with:

    static void Main(string[] args)
    {
        String html = @"
        <html>
        <head>
            <title>Test</title>
        </head>
        <body>
            <div>
                <p>Test2</p>
            </div>
        </body>
        </html>
        ";

        XmlDocument doc = new XmlDocument();
        doc.LoadXml(html);

        foreach (XmlNode node in doc.ChildNodes)
            ExamineNode(node, "");

        Console.ReadLine();
    }

    static void ExamineNode(XmlNode node, String parentPath)
    {
        String nodePath = parentPath + '/' + node.Name;

        if (!(node is XmlText))
        {
            Console.WriteLine(nodePath); // I want to show the path to this node

            foreach (XmlNode childNode in node.ChildNodes)
                ExamineNode(childNode, nodePath);
        }
    }

It might not be the most efficient (e.g. does not use StringBuilder), but it's simple and up to the required task.

Just hoping someone finds this useful someday.

Community
  • 1
  • 1
Gigi
  • 28,163
  • 29
  • 106
  • 188