I'm looking to scrape details from web sites by posting form values and getting a response page to parse in return, but having challenges with certain situations.
For non-financial sites like Slashdot.org, the below code works fine, even after logging in and scraping my own account details as a test.
But with financial sites, the same code returns a response page with verbiage like, "For your security, you must enable JavaScript to sign on to your account" and none of my financial details that I would expect to see like the account balances. Note that the response status code of "OK" (200) in this scenario doesn't indicate obvious errors. I want to build a personal "Mint.com"-type site but it's become quite challenging in this regard.
Assuming there are more robust security controls in financial sites, what can I do with the following code snippet to better handle these scenarios?
// I call this method to do the heavy lifting
public async Task<string> GetHttpResponseMessage(string url, IEnumerable<KeyValuePair<string, string>> values)
{
var content = new FormUrlEncodedContent(values);
var response = await httpClient.PostAsync(url, content);
var responseString = await response.Content.ReadAsStringAsync();
return responseString;
}