0

I'm attempting to write a web scraping app and am having problems with a site that runs some JavaScript to generate the data I require after the page has loaded.

The page runs this javascript when the page has finished loading:

$(document).ready( function() { 
$("#periodSelect, #typeSelect").change(spotSystemPrice.load);
spotSystemPrice.load();

When that finishes it populates a div ( id="spotSystemPriceOutput") with the data.

I tried doing this using only the WebBrowser class, but the InvokeScript only lets you call functions, not invoke methods. The closest solution I have found so far is to insert some javascript using MSHTML.dll which invokes the method. This seems to be working, but I need some help loading the div output into a string, or I can even work with the full body html. I'm very new to C# so am completely out of my depth with this and I think the final step I need is going to be really easy, so I just need one of you experts to help me along :)

Here is the code I'm working with. Any suggestions to help me finish it, or even a completely different solution would be greatly appreciated.

WebBrowser wb = new WebBrowser();
wb.Navigate(URL);
while (wb.ReadyState != WebBrowserReadyState.Complete)
            Application.DoEvents();
var doc = (IHTMLDocument2)wb.Document.DomDocument;
var headItems = (IHTMLElementCollection)doc.all.tags("head");
var scriptObject = (IHTMLScriptElement)doc.createElement("script");
scriptObject.type = @"text/javascript";
scriptObject.text = "spotSystemPrice.load();";
var node = (IHTMLDOMNode)headItems.item(null, 0);
node.appendChild((IHTMLDOMNode)scriptObject);

Interestingly, If I change my javascript injection to "spotSystemPrice.load(); alert('');" After clicking ok on the message box, I can see the results in the object explorer using the text visualiser which gives me an expression reference of ((((mshtml.HTMLHeadElementClass)(node)).document).body).innerHTML. How would adding alert to the javascript change my results? Do I need to wait for some kind of onComplete event?

Update: I also found this which looked useful Calling javascript object method using WebBrowser.Document.InvokeScript and I modified my code to:

WebBrowser wb = new WebBrowser();
wb.Navigate(URL);
while (wb.ReadyState != WebBrowserReadyState.Complete)
            Application.DoEvents();
string JScript = "spotSystemPrice.load();";
object[] args = { JScript };
wb.Document.InvokeScript("eval", args);
while (wb.ReadyState != WebBrowserReadyState.Complete)
            Application.DoEvents();

but that still gives me no data in the div element. But for some reason if I chance my javascript to "alert('');" and don't even try to invoke the method, the data I need is there! what's going on? I'm so confused.

Community
  • 1
  • 1
  • possible duplicate of [MSHTML : Calling member of javascript object?](http://stackoverflow.com/questions/9277839/mshtml-calling-member-of-javascript-object) – Erik Philips Apr 16 '14 at 21:03
  • this does look similar. However, like I mentioned above, I'm very new to programming/c# so I can't translate that solution to my problem. The code I've written above has been copied and amended from another article on here. I just need a little help pulling out the new html now the javascript has been invoked. – user3542912 Apr 16 '14 at 21:13

1 Answers1

0

If the function spotSystemPrice.load already exist in the script you can try InvokeScript:

wb.Document.InvokeScript("spotSystemPrice.load");

notice I'm not using the () at the end of the function. After this as more data will be loaded you need to wait until it is all loaded again, so reusing your code:

while (wb.ReadyState != WebBrowserReadyState.Complete) Application.DoEvents();

you can also wait untill the wb.Document.Body is not null and/or your div contains data.

while (wb.Document.Body == null) Application.DoEvents();
while (wb.Document.GetElementById("spotSystemPriceOutput").InnerHtml.length == 0) Application.DoEvents();

but at this point I would implement some kind of time out in case something fails so it won't get stuck there forever, but could worth a try.

Javi
  • 378
  • 2
  • 9
  • I've tried doing it this way, but after executing I'm still getting no data with: wb.Document.GetElementById("spotSystemPriceOutput").InnerHtml or viewing the whole page with: wb.Document.Body.InnerHtml – user3542912 Apr 16 '14 at 21:42
  • After invoking it, if it "navigates" or loads external data you will need to wait until the new content is loaded. Check WebBrowserReadyState.Complete as you did before. – Javi Apr 16 '14 at 21:46
  • Sorry I should have mentioned that I also tried waiting for a ReadyState.Complete after invoking the script, but that didn't work. The div element was still blank. – user3542912 Apr 16 '14 at 21:51