Custom tag not considered HTMLelement/parent in webbrowser DOM c#

Question

I am working on a XPATH generator (using absolute paths). The idea is that I have a function where you pass a HTMLElement (that is found in the webbrowser) and it will return the XPATH like:

/html/body/div[3]/div[1]/a

The function to generate the xpath looks something like this:

HTMLElement node=...;
while (node != null)
     {
       int i = FindElementIndex(node); //find the index of our current node in the parent elements
       if(i==1)
          xpath.Insert(0, "/" + node.TagName.ToLower());
       else
          xpath.Insert(0, "/" + node.TagName.ToLower() + "[" + i+ "]");
       node = node.Parent;
    }

The idea is this:

a)take the element

b)find the index position of element in element.parent

c)append xpath

The problem appears when the parent is a custom html code like "<layer>" Example:

<html>
  <body>
     <div>
        <layer>
           <a href="http://site.com">aaa</a>
        </layer>
      </div>
  </body>
</html>

If our HTMLElement is <a href="http://site.com">aaa</a> and we call ourelement.Parent it will return the DIV element and NOT the element

So instead of having: /html/body/div/layer/a

We will have (which is incorrect) /html/body/div/a

How can this be solved? Really hope someone can help figure this out.

EDIT 1: Just for testing purposes I implemented the function from Get the full path of a node, after get it with an XPath query in JavaScript

The results were that if the page that contained a "custom" tag (like <layer>) AND if the page was opened in firefox, the xpath was showed correctly.

If the page was opened in Internet Explorer (like webbrowser is) the <layer> was not included as a parent.

So the issue is with internet explorer not "parsing" the dom correctly. What is the solution? What function can help create xpath for cases like this (if using webbrowser htmlelement).

You may be interested in this XSLT solution: http://stackoverflow.com/questions/4746299/generate-get-xpath-from-xml-node-java — Dimitre Novatchev, Jun 08 '12 at 03:54

score 0 · Accepted Answer · answered Jun 08 '12 at 00:58

0

This is not a direct answer to your question; but have considered using http://htmlagilitypack.codeplex.com/ to load the HTML. It will not have the problem of ignoring the element.

answered Jun 08 '12 at 00:58

Richard Schneider

34,944
9
57
73

I am using the htmlagilitypack, but I have a dom selector (similar to firebug) that selects a htmlelement from the webbrowser. I than need to get the htmlnode (htmlagilitypack node) that corresponds to the htmlelement selected. I first tried comparing outerhtml of the element, but the dom outerhtml of the webbrowser is sometimes (it formats the text) different than the htmlagilitypack outerhtml. That's why I need to get the XPATH of the HTMLElement, so I can find the Htmlagilitypack HTMLNode. Hope it makes sense – BlasterGod Jun 08 '12 at 01:12

Custom tag not considered HTMLelement/parent in webbrowser DOM c#

1 Answers1