0

I am trying to create a little application that logs into a website, crawls the site and saves various pieces of data. I am having issues creating the webclient that logs in to the website. I have been looking at the various solutions presented here on the site and the solutions the five first pages of a google search suggested. All of them has not yielded a result...

The site is running .NET, so I am wondering if it is the viewstate that is causing issues?

Using the solution offered here: Login to website, via C# I can only see the SessionId cookie - not the .ASPXAUTH cookie that should be there once logged in.

Suggestions?

Community
  • 1
  • 1
Kasper Wittrup
  • 481
  • 1
  • 5
  • 14
  • Every website does authentication differently so there's no guarantee that a method on one website will work on another, I'm going to vote to close, apologies! – JMK May 01 '15 at 10:43
  • http://stackoverflow.com/questions/1777221/using-cookiecontainer-with-webclient-class – Paolo Costa May 01 '15 at 10:44
  • The fact that it is a .NET website is *largely* irrelevant; http is http is http; however, it is very unlikely that the site owners *want* you to do this - if they did, they would have created an API – Marc Gravell May 01 '15 at 10:49

2 Answers2

0

I would suggest you to follow this plan:

1) Install and run fiddler

2) Clear the browsers cache and cookies.

3) Go to your page, login and see what happens in fiddler, inspect request and responces, redirects etc.

in most cases the flow is

GET login page ->

POST credentials to the authorisation page and get cookies/hash in response ->

GET the authorised page using this cookie/hash.

After you know the steps then they are easy to achieve using the WebClient or even better HttpWebRequest and HttpWebResponse

See my answer for help

Community
  • 1
  • 1
VladL
  • 12,769
  • 10
  • 63
  • 83
  • What happens if they are verifying any secret tokens? e.g. Anti-Forgery token – Jamie Rees May 01 '15 at 10:53
  • @JamieRees an antiforgery token will be sent in the POST step as well so you have to parse the first GET response where it must be on the page and send it in POST – VladL May 01 '15 at 10:56
0

I would suggest using some browser automation software e.g. Selenium to do this. This way you can actually stimulate the browser to be able to log in and then scrape the data.

Here is a good example on how to do this: http://scraping.pro/example-of-scraping-with-selenium-webdriver-in-csharp/

Jamie Rees
  • 7,973
  • 2
  • 45
  • 83