1

I'm trying to scrape some schedules off of a website. the information is displayed in a GridView with paging.

The url is: http://www.landmarkworldwide.com/when-and-where/register/search-results.aspx?prgid=0&pgID=270&crid=0&ctid=&sdt=0

My Issue is when I want to scrape pages other then #1 in the grid view. The best post I found so far was This One, but it doesn't work and that topic is not complete. I tried to use Fiddler and Chrome to get the post data and use it, but I can't get it to work for me. Can you guys see what's missing?

Here's the code I am using. it's in VB, but you can answer in C# and I'll translate -) (sorry)

    Protected Sub Page_Load(sender As Object, e As System.EventArgs) Handles Me.Load

    Dim lcUrl As String = "http://www.landmarkworldwide.com/when-and-where/register/search-results.aspx?prgid=0&pgID=270&crid=0&ctid=&sdt=0"

    ' first, request the login form to get the viewstate value
    Dim webRequest__1 As HttpWebRequest = TryCast(WebRequest.Create(lcUrl), HttpWebRequest)
    Dim responseReader As New StreamReader(webRequest__1.GetResponse().GetResponseStream())
    Dim responseData As String = responseReader.ReadToEnd()
    responseReader.Close()

    ' extract the viewstate value and build out POST data
    Dim viewState As String = ExtractViewState(responseData)


    Dim loHttp As HttpWebRequest = DirectCast(WebRequest.Create(lcUrl), HttpWebRequest)
    ' *** Send any POST data

    Dim lcPostData As String = [String].Format("__VIEWSTATE={0}&__EVENTTARGET={1}&__EVENTARGUMENT={2}", viewState, HttpUtility.UrlEncode("contentwrapper_0$maincontent_0$maincontentfullwidth_0$ucSearchResults$gvPrograms"), HttpUtility.UrlEncode("Page$3"))
    loHttp.Method = "POST"

    Dim lbPostBuffer As Byte() = System.Text.Encoding.GetEncoding(1252).GetBytes(lcPostData)
    loHttp.ContentLength = lbPostBuffer.Length
    Dim loPostData As Stream = loHttp.GetRequestStream()
    loPostData.Write(lbPostBuffer, 0, lbPostBuffer.Length)
    loPostData.Close()

    Dim loWebResponse As HttpWebResponse = DirectCast(loHttp.GetResponse(), HttpWebResponse)
    Dim enc As Encoding = System.Text.Encoding.GetEncoding(1252)

    Dim loResponseStream As New StreamReader(loWebResponse.GetResponseStream(), enc)
    Dim lcHtml As String = loResponseStream.ReadToEnd()

    loWebResponse.Close()
    loResponseStream.Close()
    Response.Write(lcHtml)

End Sub

Private Function ExtractViewState(s As String) As String
    Dim viewStateNameDelimiter As String = "__VIEWSTATE"
    Dim valueDelimiter As String = "value="""

    Dim viewStateNamePosition As Integer = s.IndexOf(viewStateNameDelimiter)
    Dim viewStateValuePosition As Integer = s.IndexOf(valueDelimiter, viewStateNamePosition)

    Dim viewStateStartPosition As Integer = viewStateValuePosition + valueDelimiter.Length
    Dim viewStateEndPosition As Integer = s.IndexOf("""", viewStateStartPosition)

    Return HttpUtility.UrlEncodeUnicode(s.Substring(viewStateStartPosition, viewStateEndPosition - viewStateStartPosition))
End Function
ILevran
  • 11
  • 1

1 Answers1

-1

To make it work you need to send all input fields to the page, not only viewstate. Other critical data is the __EVENTVALIDATION for example that you do not handle it. So:

First you need to make scrape on the #1 page. So load it and use the Html Agility Pack to convert it to a usable struct.

Then extract from that struct the input data that you need to post. From this answer HTML Agility Pack get all input fields here is a code sniped on how you can do that.

foreach (HtmlNode input in doc.DocumentNode.SelectNodes("//input"))
{
    // use this to create the post string
    // input.Attributes["value"];        
}

Then when you have the post data that is needed to be a valid post, you move to the next step. Here is an example How to pass POST parameters to ASP.Net web request?

You can also read: How to use HTML Agility pack

Aristos
  • 66,005
  • 16
  • 114
  • 150