I am trying to scrape a website written in php to extract some information from a particular table. Here is the scenario.
On the landing page there is a form that can take queries from user and based on that search for the results. If I ignore those fields and click on "Submit" it will produce the whole result (Which is what I am interested in). Before I did not know about HTTPWebRequest class and I was simply passing the URL to Htmlweb.load(URL) method in HtmlAgilityPack library and obviously was not the way to go.
Then I searched for HTTPWebRequest and I found an example which is like this
Dim cookies As New CookieContainer
Dim postData As String = "postData obtained using live httpheaders pluging in firefox"
Dim encoding As New UTF8Encoding
Dim byteData As Byte() = encoding.GetBytes(postData)
Dim postRequest As HttpWebRequest = DirectCast(WebRequest.Create("URL"), HttpWebRequest)
postRequest.Method = "POST"
postRequest.KeepAlive = True
postRequest.CookieContainer = cookies
postRequest.ContentType = "application/x-www-form-urlencoded"
postRequest.ContentLength = byteData.Length
postRequest.Referer = "Referer Page"
postRequest.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; ru; rv:1.9.2.3) Gecko/20100401 Firefox/4.0 (.NET CLR 3.5.30729)"
Dim postreqstream As Stream = postRequest.GetRequestStream()
postreqstream.Write(byteData, 0, byteData.Length)
postreqstream.Close()
Dim postresponse As HttpWebResponse
postresponse = DirectCast(postRequest.GetResponse(), HttpWebResponse)
cookies.Add(postresponse.Cookies)
Dim postreqreader As New StreamReader(postresponse.GetResponseStream())
Dim thepage As String = postreqreader.ReadToEnd
Now when I output thepage variable to a browser in vb form, I can see the page that I want (Containing tables). At this point I simply passed the URL of that page to htmlagilitypack like so
Dim web As New HtmlAgilityPack.HtmlWeb()
Dim htmlDoc As HtmlAgilityPack.HtmlDocument = web.Load("URL")
Dim tabletag As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//table")
Dim tablenode As HtmlNode = htmlDoc.DocumentNode.SelectSingleNode("//table[@summary='List of services']")
If Not tabletag Is Nothing Then
Console.WriteLine("YES")
End If
But tabletag variable is nothing. I want to know where I am going wrong? Also is there anyway to get the URL straight from httpwebrespone so I can pass into web.load method ?
thank you