-1

I'm working a project that web scrapes a table for work. The website I was trying to connect to using C# WebClient libray isn't working as I need to first connect to the Website, then simulate clicking on the "Next button" to go to the next page in the table.

The code I'm using right now looks like this,

This is to connect to website with the while looking up a name:

    string urlParams = "lastName=John&firstName=Doe&PropertyID=&Submit=Serch+Properties"
    using(WebClient client = new WebClient())
    {
        client.Headers[HttpRequestHeader.ContentType] = "application/x-www-form-urlencoded";
        htmlResult = client.UploadString(url, urlParams);
    }

Then once I have the initial search I look to see if I can click next using HtmlAglityPack. If I can then I try to by send the parameters in the url.

    HtmlDocument doc = new 
    doc.LoadHtml(htmlResult);

    // I get the xpath from google chrome dev tools, inspect element and right click copy xpath
    HtmlNode nextButton = doc.DocumentNode.SelectNode(selectNodeXPath);
    if(nextButton && nextButton.InnerHtml == "Next")
    {
        // right now just trying to see the second page.
        urlParams = "lastName=John&firstName=Doe&PropertyID=&Submit=Serch+Properties&SearchLocation=" + 1;
        client.Headers[HttpRequestHeader.ContentType] = "application/x-www-form-urlencoded";
        htmlResult = client.UploadString(url, urlParams);
    }

After I do this the htmlResult is null.

Jacob Loncar
  • 107
  • 1
  • 2
  • 11
  • Unfortunately this question is far too broad for Stack Overflow. You need to ask a specific question about a _specific_ problem – maccettura Aug 09 '18 at 19:40
  • Many sites try to protect themselves from these kind of functionalities. Perhaps it's best to see if the data provider has a simple API. – Stefan Aug 09 '18 at 19:45
  • You would have to have: a) The website and port to connect to. b) You would actually need *access* to said website. and c) You would have to know the URL of said database and you would have to use an SQL (I assume) or some type of connection string to connect to it. The only way you could do this without having direct access to the website would be if the website itself has an API used to retrieve records. – Chris Aug 09 '18 at 20:10
  • @maccettura Sorry new to this whole thing will keep this mind for next time! – Jacob Loncar Aug 09 '18 at 22:35

2 Answers2

0

If the database is a remote SQL Server database, then you could add the database to your project by selecting the "Code First from Database" option:

  1. Project -> Add New Item…
  2. Select Data from the left menu and then ADO.NET Entity Data Model
  3. Enter BloggingContext as the name and click OK
  4. This launches the Entity Data Model Wizard
  5. Select Code First from Database and click Next
  6. Enter database connection details and finish...

Then when you want to query the database you would instantiate the derived DbContext class generated by the wizard.

. . .
using (var ctx = new BloggingContext()) 
{
    var members = ctx.Members.Where(x => x.LastName = "Jones");
}
return members;
. . . 

The BloggingContext can be found by searching for ": DbContext" in your entire solution.

Keith Harris
  • 1,118
  • 3
  • 13
  • 25
0

After doing some googling I found the answer and saw that my first approach is really off.

I download and installed fiddler so I can see my exact web traffic and give me an idea of how I need to set up my request methods.

How I used fiddler:

  1. Connect to website, Enter the search (in my case a first and last name fields)
  2. Hit search
  3. Look at the web traffic that fiddler has logged for me and see what the parameters are called and which one to copy.
  4. Click the next button
  5. Repeat Step 3.

Starting I switched from using WebClient to HttpClient with a combination of KeyValuePairs.

The code is basically two steps. Make initial connection and give a new key value pair for each page from the search results.

Basic code looks like this.


Step 1) Make initial connection

HttpClientHandler httpClientHandler = new HttpClientHandler();
HttpClient client = new HttpClient();

//Manulally contruct the request header
var stringContent = new FormUrlEncodedContent(new[]
{
    new KeyValuePair<string, string>("hJava", "Y"),
    new KeyValuePair<string, string>("SearchFirstName", firstName),
    new KeyValuePair<string, string>("SearchLastName", lastName),
    new KeyValuePair<string, string>("HomeState", state),
    new KeyValuePair<string, string>("frontpage", "1"),
    new KeyValuePair<string, string>("GO.x", "0"),
    new KeyValuePair<string, string>("GO.y", "0"),
    new KeyValuePair<string, string>("GO", "Go")
});

var response = client.PostAsync(url, stringContent).Result;
var initialSearch = response.Content.ReadAsStringAsync().Result;

Step 2) Using the same intance of HttpClient, create a new request that resembles the first one made, but add in the parts for clicking the next button

// New request header to filter our initial search results 
var stringContent = new FormUrlEncodedContent(new[]
{
     new KeyValuePair<string, string>("hJava", "Y"),
     new KeyValuePair<string, string>("searchLocation", "1"),
     new KeyValuePair<string, string>("SearchFirstName", firstName),
     new KeyValuePair<string, string>("SearchLastName", lastName),
     new KeyValuePair<string, string>("SearchStateID", state),
     new KeyValuePair<string, string>("GO.x", "0"),
     new KeyValuePair<string, string>("GO.y", "0"),
     new KeyValuePair<string, string>("GO", "Go")
 });

 var response = client.PostAsync(url, stringContent).Result;
 var nextSearch = response.Content.ReadAsStringAsync().Result;

And that's really it. You can do this for all pages that a result of the search. just would have to change new KeyValuePair<string, string>("searchLocation", "1"), in this example I would change the 1 to a 2.

Jacob Loncar
  • 107
  • 1
  • 2
  • 11