1

Hi all as per the requirement i am having i would like to extract the data from this site

http://loving1.tea.state.tx.us/lonestar/Menu_dist.aspx?parameter=101902

I would like to extract the data that was presented in grid how can i can any one help me

I tried this

WebRequest request = WebRequest.Create("http://loving1.tea.state.tx.us/lonestar/Menu_dist.aspx?parameter=101902");
    WebResponse response = request.GetResponse();
    Stream data = response.GetResponseStream();
    string html = String.Empty;
    using (StreamReader sr = new StreamReader(data))
    {
        html = sr.ReadToEnd();
   }

enter image description here

The gird data i would like to extract is in the image. Please help

Vivekh
  • 4,141
  • 11
  • 57
  • 102
  • Why don't you simply ask them where do they get the data from and request that data through an api that they might have, or you will probably run into legal issues here... – balexandre Nov 07 '11 at 10:20
  • thre is Export To Excel action button, I believe you can export in Excel and then parse table or more straightforward way - read whole HTML page and parse it by finding specific table tag – sll Nov 07 '11 at 10:21
  • I am unable to find the specified tag when i view source – Vivekh Nov 07 '11 at 10:22
  • 1
    How about using [Html Agility](http://htmlagilitypack.codeplex.com/) for this – V4Vendetta Nov 07 '11 at 10:23

2 Answers2

1

Straightforward way - download a page and parse HTML by finding out appropriate <table> tags, but in this way your "parser" has to be updated each time even HTML layout has been changed or whatever...

An other way is to leverage "Export To..." feature which is kindly provided by the site, so you can simulate HTTP request using "Export to Excel 2007 button". The idea is Excel 2007 workbooks is a zip archive with an XML data files and CSS style sheets. So you would be able to load well-formed XML data file/multiple files.

Underlying URL:

http://loving1.tea.state.tx.us/Common.Cognos/Secured/ReportViewer.aspx?reportSearchPath=/content/folder[@name='TPEIR']/folder[@name='LS']/package[@name='Districts and Schools']/report[@name='AAG5_Dist_Over']&ui.name=AAG5_Dist_Over&year=2010&district=101902&server=Loving1.tea.state.tx.us/lonestar

then download XLSX file which is ZIP archive with embedded XML files

  • xl\worksheets\Sheet1.xml
  • xl\workbook.xml

so just unzip, load XML and enjoy it...

sll
  • 61,540
  • 22
  • 104
  • 156
  • `@sll` so as per ur saying i have to save the xlsx and then read the required content from that right – Vivekh Nov 07 '11 at 11:56
  • @Vivekh : yep, basically 1) save XLSX 2) unzip ([use any of these libs](http://stackoverflow.com/questions/1023476/recommend-a-library-api-to-unzip-file-in-c-sharp)) 3) load required XML files – sll Nov 07 '11 at 13:56
  • Hi `Sll` a small question will i get the XLSX from the link u posted – Vivekh Nov 07 '11 at 14:04
  • You can use Fiddler to see exact URL whilst downoading a file (I believe you see Export button on top of the page) – sll Nov 07 '11 at 20:27
1

Use WebClient.DownloadString("http://loving1.tea.state.tx.us/lonestar/Menu_dist.aspx?parameter=101902") to get the data from the server.
And than use HTMLAgilityPack to parse the html.

Developer
  • 8,390
  • 41
  • 129
  • 238
Svarog
  • 2,188
  • 15
  • 21
  • As @Oded wrote, you can all do with agility pack... first download and then extract data with XPath – sasjaq Nov 07 '11 at 13:37