-2

There is this website on the net (this website is not built by me) which contains a grid and it is paged, so it spans many pages. I want the contents of each page of the grid in an excel sheet. It is a very cumbersome and not-so-very-smart way if I try to do it manually.

So is it possible to do this using a c#.net windows application?

Are there any freewares which would help me achieve this, something like a web crawler or a web spider or something like that?

TheBoyan
  • 6,802
  • 3
  • 45
  • 61
samar
  • 5,021
  • 9
  • 47
  • 71
  • 3
    LOL ripping off other people's websites. If you're going to do it, at least put in the hard yards and do it manually - your karma will marginally improve. Forgive me if I assume you're not going to ask the orignal author if you can ripoff his/her website. – Pete855217 Jan 09 '12 at 09:25
  • 2
    @Pete855217 web scraping is a very common practice and there are many legitimate reasons to do it. It's not "ripping off" since anything you put on the internet is public and open. – MattDavey Jan 09 '12 at 09:37

2 Answers2

2

The term is called Web Scraping. and it is not an easy task to achieve using code.

You can use HttpWebRequest/HttpWebResponse classes or WebClient class to access and get the pages themselves. Then you can use regular expressions or something else like something like HTML Agility Pack to parse the data you need.

As for third party tools there are a lot of questions already answered on SO, but here's one you could take a look at: What's a good Web Crawler tool

Community
  • 1
  • 1
TheBoyan
  • 6,802
  • 3
  • 45
  • 61
0

You can use something like the HTMLAgilityPack to get the webpage using C#, and then use an XPath query to extract the data you need. You'll need to know the correct way to format the query string in order to emulate the paging. If the table uses AJAX for paging you'll probably need to use some external tool/sniffer to find out the correct url/query string. I recommend the Firebug plugin for Firefox to do this.

Once you have extracted the HTML table via XPath, you could use XSLT to transform it into CSV format. From CSV it is very easy to import it into Excel using Office Interop.

MattDavey
  • 8,897
  • 3
  • 31
  • 54