1

I'm trying to web scrape some data from a website which populates a table of data when "$(document).ready" occurs.

After my webBrowser ReadyState is complete, the data is not present in the DIV element. I thought it may be because though the document state is complete, it may take a few seconds more to finish loading the data from the JS invoked by the client. So I tried a timer, a while loop to wait until the div was populated with content, running the exe in IE8&9 mode, and invoking the same JS method which is called when the page is finished loading. None of the above gave me the data I needed.

Interestingly, if I add a MessageBox to my code, after clicking it the DIV has it's data. It's driving me mad trying to figure out what is causing the change.

    static void Main(string[] args)
    {
        System.Threading.Thread t = new System.Threading.Thread(ThreadStart);
        t.SetApartmentState(System.Threading.ApartmentState.STA);
        t.Start();
        Console.WriteLine("Downloading page...");
        Console.ReadLine();
    }

    public static void ThreadStart()
    {
        WebBrowser wb = new WebBrowser();
        wb.Navigate(url);
        while (wb.ReadyState != WebBrowserReadyState.Complete)
            Application.DoEvents();

    wb.Document.InvokeScript("spotSystemPrice.load");
        while (wb.ReadyState != WebBrowserReadyState.Complete)
            Application.DoEvents();

        string output1 = wb.Document.GetElementById(divname).InnerHtml;
        MessageBox.Show("");
        string output2 = wb.Document.GetElementById(divname).InnerHtml;
    }

When this runs output1 is blank, output2 has the data I need. Any ideas what is causing the MessageBox prompt to populate the DIV? I'm sure it's not just a time issue because I experimented with adding many different timer intervals after the readstate was complete.

  • This may help: http://stackoverflow.com/a/22262976/1768303 – noseratio Apr 22 '14 at 00:01
  • Thanks. Will take a look at that. I'm a C# newbie so may take me a while though! :) I have figured out how to get round the problem. I read some more web scraping tutorials and was able to retrieve the data I need using a HttpWebRequest. By running a HttpAnalyzer add-on in my browser, I was able to see the post calls that are made to fetch the data. I replicated the request in code and it works! Will definitely take a look at your solution though, it's still bugging me why my original method wasn't working! – user3542912 Apr 22 '14 at 19:14
  • 1
    Just tried your code too btw. It worked for me first time! Impressive! I'm determined to understand it properly though so will work through your solution properly. Many thanks for your help! – user3542912 Apr 22 '14 at 19:31

0 Answers0