How can i extract the data that required from web page

Question

Hi all as per the requirement i am having i would like to extract the data from this site

http://loving1.tea.state.tx.us/lonestar/Menu_dist.aspx?parameter=101902

I would like to extract the data that was presented in grid how can i can any one help me

I tried this

WebRequest request = WebRequest.Create("http://loving1.tea.state.tx.us/lonestar/Menu_dist.aspx?parameter=101902");
    WebResponse response = request.GetResponse();
    Stream data = response.GetResponseStream();
    string html = String.Empty;
    using (StreamReader sr = new StreamReader(data))
    {
        html = sr.ReadToEnd();
   }

enter image description here

The gird data i would like to extract is in the image. Please help

Why don't you simply ask them where do they get the data from and request that data through an api that they might have, or you will probably run into legal issues here... — balexandre, Nov 07 '11 at 10:20
thre is Export To Excel action button, I believe you can export in Excel and then parse table or more straightforward way - read whole HTML page and parse it by finding specific table tag — sll, Nov 07 '11 at 10:21
How about using [Html Agility](http://htmlagilitypack.codeplex.com/) for this — V4Vendetta, Nov 07 '11 at 10:23

sll · Answer 1 · 2011-11-08T10:32:46.923

1

Straightforward way - download a page and parse HTML by finding out appropriate <table> tags, but in this way your "parser" has to be updated each time even HTML layout has been changed or whatever...

An other way is to leverage "Export To..." feature which is kindly provided by the site, so you can simulate HTTP request using "Export to Excel 2007 button". The idea is Excel 2007 workbooks is a zip archive with an XML data files and CSS style sheets. So you would be able to load well-formed XML data file/multiple files.

Underlying URL:

http://loving1.tea.state.tx.us/Common.Cognos/Secured/ReportViewer.aspx?reportSearchPath=/content/folder[@name='TPEIR']/folder[@name='LS']/package[@name='Districts and Schools']/report[@name='AAG5_Dist_Over']&ui.name=AAG5_Dist_Over&year=2010&district=101902&server=Loving1.tea.state.tx.us/lonestar

then download XLSX file which is ZIP archive with embedded XML files

xl\worksheets\Sheet1.xml
xl\workbook.xml

so just unzip, load XML and enjoy it...

edited Nov 08 '11 at 10:32

answered Nov 07 '11 at 10:24

sll

61,540
22
104
156

`@sll` so as per ur saying i have to save the xlsx and then read the required content from that right – Vivekh Nov 07 '11 at 11:56
@Vivekh : yep, basically 1) save XLSX 2) unzip ([use any of these libs](http://stackoverflow.com/questions/1023476/recommend-a-library-api-to-unzip-file-in-c-sharp)) 3) load required XML files – sll Nov 07 '11 at 13:56
Hi `Sll` a small question will i get the XLSX from the link u posted – Vivekh Nov 07 '11 at 14:04
You can use Fiddler to see exact URL whilst downoading a file (I believe you see Export button on top of the page) – sll Nov 07 '11 at 20:27

score 1 · Answer 2 · edited Nov 07 '11 at 10:36

1

Use WebClient.DownloadString("http://loving1.tea.state.tx.us/lonestar/Menu_dist.aspx?parameter=101902") to get the data from the server.
And than use HTMLAgilityPack to parse the html.

edited Nov 07 '11 at 10:36

Developer

8,390
41
129
238

answered Nov 07 '11 at 10:25

Svarog

2,188
15
21

As @Oded wrote, you can all do with agility pack... first download and then extract data with XPath – sasjaq Nov 07 '11 at 13:37

How can i extract the data that required from web page

2 Answers2