How do I scrape web content from secure sites after authenticating?

Question

I'm looking to scrape details from web sites by posting form values and getting a response page to parse in return, but having challenges with certain situations.

For non-financial sites like Slashdot.org, the below code works fine, even after logging in and scraping my own account details as a test.

But with financial sites, the same code returns a response page with verbiage like, "For your security, you must enable JavaScript to sign on to your account" and none of my financial details that I would expect to see like the account balances. Note that the response status code of "OK" (200) in this scenario doesn't indicate obvious errors. I want to build a personal "Mint.com"-type site but it's become quite challenging in this regard.

Assuming there are more robust security controls in financial sites, what can I do with the following code snippet to better handle these scenarios?

// I call this method to do the heavy lifting
public async Task<string> GetHttpResponseMessage(string url, IEnumerable<KeyValuePair<string, string>> values)
{
    var content = new FormUrlEncodedContent(values);            
    var response = await httpClient.PostAsync(url, content);
    var responseString = await response.Content.ReadAsStringAsync();
    return responseString;
}

See [this question](https://stackoverflow.com/questions/11393075/running-scripts-in-htmlagilitypack) and [this question](https://stackoverflow.com/questions/10886161/load-a-dom-and-execute-javascript-server-side-with-net/10886733#10886733) for information on how to run page scripts. You are going to need a script engine, such as the one in your browser, so the accepted answer suggests using a WebBrowser control. — John Wu, May 14 '19 at 01:53
I find using multiple webpages work very well in cases like this. I keep one web page on the main pages and then use second webpage to scrape the children. You can always make the main page invisible when going to the children. — jdweng, May 14 '19 at 02:24
Thanks, John Wu, these seem to be more what I am looking for. — Yoav, May 17 '19 at 11:13

How do I scrape web content from secure sites after authenticating?

0 Answers0