1

I am hitting .aspx pages via httpclient - C#. So in response, we get the redirected pages. Below mentioned is the part of that response, How can I retrieve the highlighted values and store them in the model.

    <div id="SchoolInfo">
    **ABES School** (**13895**)<br/>Tel (987) 334 5533  <br />
    <form name="form2" id="form2" method="post">
        <input type="hidden" name="MyTargetID" id="MyTargetID" value="" />
        <input type="hidden" name="MyArgument" id="MyArgument" value="" />
        <input name="dcb46ec8-be16-4932-8a01-49cd075271a6$hdnOldSelection" type="hidden" id="dcb46ec8-be16-4932-8a01-49cd075271a6_hdnOldSelection" value="73400" />
    Switch School Year:
        <select name="dcb46ec8-be16-4932-8a01-49cd075271a6$ddlSwitchSessionYear" id="dcb46ec8-be16-4932-8a01-49cd075271a6_ddlSwitchSessionYear" onchange="confirmsessionchange(this);">
        <option selected="selected" value="**73400**">**2020-2021**</option>

School Year : 2020-2021 School ID : 73400 School Code : 13895 School Name : ABES School

  • Since you're essentially _scraping html_ then **parse the html** (use a [library](https://html-agility-pack.net/), [not regex](https://stackoverflow.com/a/1732454/304683)) to get the data you want. – EdSF Jul 08 '20 at 21:01

1 Answers1

0

To retrieve data from html document you can use Html Agility Pack.

Your code will look like this:

var html = @"http://html-agility-pack.net/";
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(html);
var node = htmlDoc.DocumentNode.SelectSingleNode("//div[@id='SchoolInfo']");
...

Then you should play with selectors and go deeper to another nodes.

Finally, to retrieve School Name + School Code and School Year you will need InnerText property of node. In your example they are stuck together, so looks like you are going to need to divide them. To do so, use regular expressions.

To retrieve School ID you may use GetAttributeValue method of node.