2

I want to use currently loaded webpage in internet explorer as HtmlDocument in HtmlAgilityPack. I am using explorer document through mshtml as COM object.

mshtml.HTMLDocument doc = explorer.Document as mshtml.HTMLDocument;

Then I've tried to convert it to HtmlDocument which is using in HtmlAgilityPack

HtmlAgilityPack.HtmlDocument hdoc = (HtmlAgilityPack.HtmlDocument)doc;

But it's not working due to invalid cast operation. Exception message is shown below.

Exception Message

Anyhow I want to use currently loaded webpage as source to htmlagilitypack, I know that I can use HtmlWeb provided by htmlagility pack and load current url but I want to highlight elements which are in the loaded page (elements found using htmlagilitypack) I guess it cannot be done through that kind of implementation. Any ideas to implement this any support will be great. thanks.

Ruwanka De Silva
  • 3,555
  • 6
  • 35
  • 51

1 Answers1

4

Of course you can't cast between mshtml.HTMLDocument and HtmlAgilityPack.HtmlDocument, they're completely distinct classes from different libraries, where one is purely managed and the other is a managed COM wrapper.

What you can do is grab the HTML from the mshtml.HTMLDocument and load it into the Agility Pack.

Probably something along these lines:

  mshtml.IHTMLDocument3 sourceDoc = (mshtml.IHTMLDocument3) explorer.Document;  
  string documentContents = sourceDoc.documentElement.outerHTML; 

  HtmlAgilityPack.HtmlDocument targetDoc = new HtmlAgilityPack.HtmlDocument();

  targetDoc.LoadHtml(documentContents);

You could also use the IPersistStream and then call the Save method, pass a MemoryStream and then feed that to the HtmlAgilityPack.

jessehouwing
  • 106,458
  • 22
  • 256
  • 341
  • Thanks for the answer. I have already done that and achieved what I wanted, but the next step is to add styles for found elements actually in the explorer do you have a suggestion for do that? – Ruwanka De Silva Aug 26 '14 at 08:48
  • `mshtml.HTMLDocument doc = explorer.Document; mshtml.IHTMLDocument3 idoc = (mshtml.IHTMLDocument3)doc; String str = idoc.documentElement.innerHTML; HtmlAgilityPack.HtmlDocument hdoc = new HtmlAgilityPack.HtmlDocument(); hdoc.LoadHtml(str);` This is the code I have used. – Ruwanka De Silva Aug 26 '14 at 08:49
  • Updated with your answer. I don't really understand your question, do you update the Agility pack document and then want to see the updates in Explorer? – jessehouwing Aug 27 '14 at 09:22
  • In that case `sourceDoc.documentElement.outerElement = targetDoc.DocumentNode.OuterHtml;` will probably work :). – jessehouwing Aug 27 '14 at 09:25
  • yeah I done that, but it is consuming lot of time when trying to find elements which are deeper in the hierarchy. I think I'll have to filter elements before doing comparison. Up to now I able to eliminate elements which are not having same tag name. It improves performance, Any ideas to gain more performance? – Ruwanka De Silva Aug 27 '14 at 12:17
  • For you misunderstanding, actually I want to confirm that element which is having specific xpath is available (I'm doing that through HtmlAgilityPack) then I want to highlight it in actual web page which is already loaded in explorer. – Ruwanka De Silva Aug 27 '14 at 12:24
  • Can you post a new question on that? Explain what you want, show what you have, tell what you've tried. That way there is a clear new starting point. – jessehouwing Aug 27 '14 at 13:54