24

I am using C# HttpWebRequest to get some data of a webpage. The problem is that some of the data is updated using javascript/ajax after the page is loaded and I am not getting it in the response string. Is there a way to have the webrequest wait untill all the scripts in the page have finished executing?

Thanks

Amit

MattGrommes
  • 11,974
  • 9
  • 37
  • 40
Amit Raz
  • 5,370
  • 8
  • 36
  • 63

7 Answers7

17

Just an idea but there is a way to have .net load a webpage as if it were in a browser: using System.Windows.Forms

you could Load the webpage into a WebBrowser control

WebBrowser wb = new WebBrowser();
wb.ScrollBarsEnabled = false;
wb.ScriptErrorsSuppressed = true;
wb.Navigate(url);
while (wb.ReadyState != WebBrowserReadyState.Complete) { Application.DoEvents(); }
wb.Document.DomDocument.ToString()

This will probably give you the pre ajax DOM but maybe there is a way to let it run the ajax first.

missaghi
  • 5,044
  • 2
  • 33
  • 43
  • 1
    You will need to add a reference to System.Windows.Forms to access the WebBrowser class (if it's not already referenced in your project). – grasmi Mar 20 '13 at 11:48
  • 1
    Additionally, the following helped get a threaded version of this solution working. https://stackoverflow.com/questions/4269800/webbrowser-control-in-a-new-thread/4271581#4271581 – Jerrill May 26 '17 at 20:42
10

If I correctly interpret your question, there is no simple solution for your problem.

You are scraping the HTML from a server and since your C# code is not a real web browser, it doesn't execute client scripts.

This way you can't access information which the HTML you fetch doesn't contain.

Edit: I don't know how complex these AJAX calls from the original web site are, but you could use Firebug or Fiddler for IE to see how the requests are made in order to call these AJAX calls in your C# application too. So you could add the pieces of information you'll need. But it's only a theoretical solution.

splattne
  • 102,760
  • 52
  • 202
  • 249
  • I edited my question with a THEORETICAL solution... it depends on the circumstance, how often do the pages change... – splattne Feb 05 '09 at 14:23
  • I am checking the data every 30 minutes. I think thats what i will have to do, bummer! – Amit Raz Feb 05 '09 at 14:25
  • Generally you would run a real browser and automate it from C#. https://www.seleniumhq.org/ – ed22 Jul 30 '19 at 07:16
4

Use HttpWebRequest to download the page, programatically search the source code for the relevant ajax information and then use a new HttpWebRequest to pull that data down.

Chris Almond
  • 577
  • 3
  • 6
4

When you open a web page in a web browser, it is the browser that executes the javascript and downloads additional resources used by the page (images, scripts, etc). HttpWebRequest by itself will not do any of this, it will only download the html for the page you requested. It will never execute any of the javascript/ajax code on it's own.

Misko
  • 2,044
  • 12
  • 15
3

HttpWebRequest does not emulate a web browser, it just downloads the resource you point it at. This means it will not execute or even download JavaScript files.

You would have to use something like FireBug to get the URL for the data being pulled in via JavaScript, and point your HttpWebRequest at that.

roryf
  • 29,592
  • 16
  • 81
  • 103
1

Use HttpWebRequest to download the page. Search the source code for the relevant AJAX information and then use a new HttpWebRequest to pull that data down.

Bob Kaufman
  • 12,864
  • 16
  • 78
  • 107
0

You could use of the PhantomJs. I had this Issue, but don't found solution for my problem. In my opinion, best solution is This.

My solution is look like this:

var page = require('webpage').create();

page.open("https://sample.com", function(){
    page.evaluate(function(){
        var i = 0,
        oJson = jsonData,
        sKey;
        localStorage.clear();

        for (; sKey = Object.keys(oJson)[i]; i++) {
            localStorage.setItem(sKey,oJson[sKey])
        }
    });

    page.open("https://sample.com", function(){
        setTimeout(function(){
         page.render("screenshoot.png") 
            // Where you want to save it    
           console.log(page.content); //page source
            // You can access its content using jQuery
            var fbcomments = page.evaluate(function(){
                return $("body").contents().find(".content") 
            }) 
            phantom.exit();
        },10000)
    });     
});
Community
  • 1
  • 1