I'm trying to web scrape some data from a website which populates a table of data when "$(document).ready" occurs.
After my webBrowser ReadyState is complete, the data is not present in the DIV element. I thought it may be because though the document state is complete, it may take a few seconds more to finish loading the data from the JS invoked by the client. So I tried a timer, a while loop to wait until the div was populated with content, running the exe in IE8&9 mode, and invoking the same JS method which is called when the page is finished loading. None of the above gave me the data I needed.
Interestingly, if I add a MessageBox to my code, after clicking it the DIV has it's data. It's driving me mad trying to figure out what is causing the change.
static void Main(string[] args)
{
System.Threading.Thread t = new System.Threading.Thread(ThreadStart);
t.SetApartmentState(System.Threading.ApartmentState.STA);
t.Start();
Console.WriteLine("Downloading page...");
Console.ReadLine();
}
public static void ThreadStart()
{
WebBrowser wb = new WebBrowser();
wb.Navigate(url);
while (wb.ReadyState != WebBrowserReadyState.Complete)
Application.DoEvents();
wb.Document.InvokeScript("spotSystemPrice.load");
while (wb.ReadyState != WebBrowserReadyState.Complete)
Application.DoEvents();
string output1 = wb.Document.GetElementById(divname).InnerHtml;
MessageBox.Show("");
string output2 = wb.Document.GetElementById(divname).InnerHtml;
}
When this runs output1 is blank, output2 has the data I need. Any ideas what is causing the MessageBox prompt to populate the DIV? I'm sure it's not just a time issue because I experimented with adding many different timer intervals after the readstate was complete.