1

Is there a way in C# to get the output of AJAX or Java? What I'm trying to do is grab the specifics of items on a webpage, however the webpage does not load it into the original source. Does anybody have a good tutorial or a good place to start?

For example, I would want to get all the car listings from http://www.madisonhonda.com/Preowned-Inventory.aspx#layout=layout1

mkj
  • 2,761
  • 5
  • 24
  • 28
  • nFreeze's answer to your original question will work. You would create a Windows Forms application, put the WebBrowser control into the form, direct the control to load the page, wait for the javascript to run, and then access the DOM using the Document property. – Michael Petito Jun 07 '11 at 01:05
  • yes except for that would require loading 1000's of pages into the browser which would take an extremely long time, i'm looking for a way to just load it. –  Jun 07 '11 at 01:15
  • okay whats a good example of accessing the DOM –  Jun 07 '11 at 01:38
  • Are you confusing [Java](http://en.wikipedia.org/wiki/Java) with [JavaScript](http://en.wikipedia.org/wiki/JavaScript)? – sarnold Jun 07 '11 at 01:52

4 Answers4

3

If the DOM is being modified by javascript through ajax calls, and this modified data is what you are trying to capture then using a standard .NET WebClient won't work. You need to use a WebBrowser control so that it will actually execute the script, otherwise you will just be downloading the source.

mr.freeze
  • 13,731
  • 5
  • 36
  • 42
  • okay do you have a good C# example? i found this http://blog.motane.lu/2009/06/18/pywebkitgtk-execute-javascript-from-python/ but no example how to do it in C# –  Jun 06 '11 at 02:00
  • See the `Document` property of the `WebBrowser` control nFreeze mentioned. You can access the DOM via this property, after the client side scripting has processed the AJAX results. – Michael Petito Jun 06 '11 at 03:52
  • Hmm, i'm not going to be able to handle this one from scratch. Do you know of a good example? –  Jun 06 '11 at 03:54
1

If you need to just "load" it, then you'll need to understand how the page functions and try making the AJAX call yourself. Firebug and other similar tools allow you to see what requests are made by the browser.

There is no reason you cannot make the same web request from C# that the original page is making from Javascript. Depending on the architecture of the website, this could range in difficulty from constructing the proper URL with query string arguments (easy) to simulating a post with lots of page state (hard). The response content would most likely then be XML or JSON content instead of the HTML DOM, which if you're scraping for data will be a plus.

Michael Petito
  • 12,891
  • 4
  • 40
  • 54
0

A long time ago I wrote a VB app to screen scrape financial sites and made it so that you could fire up multiple of these "harvester" screen scrapers. That might ease the time period loading data. We could do thousands of scrapes a day with multiple of these running on multiple boxes. Each harvester got its marching orders from information stored in the database, like what customer to get next and what was needed to scrape (balances, transaction history, etc.).

Like Michael said above, make a simple WinForms app with a WebBrowser control in it. You have to trap the DocumentComplete event. That should only fire when the web page is completely loaded. Then check out this post which gives an overview of how to do it.

Community
  • 1
  • 1
Rob
  • 1,390
  • 1
  • 12
  • 25
  • Yup, but keep in mind that the responses to any AJAX requests that are used to modify the DOM might come in well after the `DocumentComplete` event fires. – Michael Petito Jun 07 '11 at 02:00
  • thanks to both Michael and Rob, when i have it complete i'll post the entire dang code –  Jun 07 '11 at 02:39
-1

Use the Html Agility Pack. It allows download of .html and scraping via XPath.

See How to use HTML Agility pack

Community
  • 1
  • 1
Richard Schneider
  • 34,944
  • 9
  • 57
  • 73