17

So, I've been scouring the web trying to learn more about how to log into websites programmatically using C#. I don't want to use a web client. I think I want to use something like HttpWebRequest and HttpWebResponse, but I have no idea how these classes work.

I guess I'm looking for someone to explain how they work and the steps required to successfully log in to, say, WordPress, an email account, or any site that requires that you fill in a form with a username and password.

Here's one of my attempts:

// Declare variables
        string url = textBoxGetSource.Text;
        string username = textBoxUsername.Text;
        string password = PasswordBoxPassword.Password;

        // Values for site login fields - username and password html ID's
        string loginUsernameID = textBoxUsernameID.Text;
        string loginPasswordID = textBoxPasswordID.Text;
        string loginSubmitID = textBoxSubmitID.Text;

        // Connection parameters
        string method = "POST";
        string contentType = @"application/x-www-form-urlencoded";
        string loginString = loginUsernameID + "=" + username + "&" + loginPasswordID + "=" + password + "&" + loginSubmitID;
        CookieContainer cookieJar = new CookieContainer();
        HttpWebRequest request;

        request = (HttpWebRequest)WebRequest.Create(url);
        request.CookieContainer = cookieJar;
        request.Method = method;
        request.ContentType = contentType;
        request.KeepAlive = true;
        using (Stream requestStream = request.GetRequestStream())
        using (StreamWriter writer = new StreamWriter(requestStream))
        {
            writer.Write(loginString, username, password);
        }

        using (var responseStream = request.GetResponse().GetResponseStream())
        using (var reader = new StreamReader(responseStream))
        {
            var result = reader.ReadToEnd();
            Console.WriteLine(result);
            richTextBoxSource.AppendText(result);
        }

        MessageBox.Show("Successfully logged in.");

I don't know if I'm on the right track or not. I end up being returned back to the login screen of whatever site I try. I've downloaded Fiddler and was able to glean a little bit of information about what information is sent to the server, but I feel completely lost. If anyone could shed some light here, I would greatly appreciate it.

DGarrett01
  • 391
  • 1
  • 2
  • 13

2 Answers2

37

Logging into websites programatically is difficult and tightly coupled with how the site implements its login procedure. The reason your code isn't working is because you aren't dealing with any of this in your requests/responses.

Let's take fif.com for example. When you type in a username and password, the following post request gets sent:

POST https://fif.com/login?task=user.login HTTP/1.1
Host: fif.com
Connection: keep-alive
Content-Length: 114
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Origin: https://fif.com
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.103 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Referer: https://fif.com/login?return=...==
Accept-Encoding: gzip,deflate
Accept-Language: en-US,en;q=0.8
Cookie: 34f8f7f621b2b411508c0fd39b2adbb2=gnsbq7hcm3c02aa4sb11h5c87f171mh3; __utma=175527093.69718440.1410315941.1410315941.1410315941.1; __utmb=175527093.12.10.1410315941; __utmc=175527093; __utmz=175527093.1410315941.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmv=175527093.|1=RegisteredUsers=Yes=1

username=...&password=...&return=aHR0cHM6Ly9maWYuY29tLw%3D%3D&9a9bd5b68a7a9e5c3b06ccd9b946ebf9=1

Notice the cookies (especially the first, your session token). Notice the cryptic url-encoded return value being sent. If the server notices these are missing, it won't let you login.

HTTP/1.1 400 Bad Request

Or worse, a 200 response of a login page with an error message buried somewhere inside.

But let's just pretend you were able to collect all of those magic values and pass them in an HttpWebRequest object. The site wouldn't know the difference. And it might respond with something like this.

HTTP/1.1 303 See other
Server: nginx
Date: Wed, 10 Sep 2014 02:29:09 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Location: https://fif.com/

Hope you were expecting that. But if you've made it this far, you can now programatically fire off requests to the server with your now validated session token and get the expected HTML back.

GET https://fif.com/ HTTP/1.1
Host: fif.com
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.103 Safari/537.36
Referer: https://fif.com/login?return=aHR0cHM6Ly9maWYuY29tLw==
Accept-Encoding: gzip,deflate
Accept-Language: en-US,en;q=0.8
Cookie: 34f8f7f621b2b411508c0fd39b2adbb2=gnsbq7hcm3c02aa4sb11h5c87f171mh3; __utma=175527093.69718440.1410315941.1410315941.1410315941.1; __utmb=175527093.12.10.1410315941; __utmc=175527093; __utmz=175527093.1410315941.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmv=175527093.|1=RegisteredUsers=Yes=1

And this is all for fif.com - this juggling of cookies and tokens and redirects will be completely different for another site. In my experience (with that site in particular), you have three options to get through the login wall.

  1. Write an incredibly complicated and fragile script to dance around the site's procedures
  2. Manually log into the site with your browser, grab the magic values, and plug them into your request objects or
  3. Create a script to automate selenium to do this for you.

Selenium can handle all the juggling, and at the end you can pull the cookies out and fire off your requests normally. Here's an example for fif:

//Run selenium
ChromeDriver cd = new ChromeDriver(@"chromedriver_win32");
cd.Url = @"https://fif.com/login";
cd.Navigate();
IWebElement e = cd.FindElementById("username");
e.SendKeys("...");
e = cd.FindElementById("password");
e.SendKeys("...");
e = cd.FindElementByXPath(@"//*[@id=""main""]/div/div/div[2]/table/tbody/tr/td[1]/div/form/fieldset/table/tbody/tr[6]/td/button");
e.Click();

CookieContainer cc = new CookieContainer();

//Get the cookies
foreach(OpenQA.Selenium.Cookie c in cd.Manage().Cookies.AllCookies)
{
    string name = c.Name;
    string value = c.Value;
    cc.Add(new System.Net.Cookie(name,value,c.Path,c.Domain));
}

//Fire off the request
HttpWebRequest hwr = (HttpWebRequest) HttpWebRequest.Create("https://fif.com/components/com_fif/tools/capacity/values/");
hwr.CookieContainer = cc;
hwr.Method = "POST";
hwr.ContentType = "application/x-www-form-urlencoded";
StreamWriter swr = new StreamWriter(hwr.GetRequestStream());
swr.Write("feeds=35");
swr.Close();

WebResponse wr = hwr.GetResponse();
string s = new System.IO.StreamReader(wr.GetResponseStream()).ReadToEnd();
Rich
  • 4,134
  • 3
  • 26
  • 45
xavier
  • 877
  • 6
  • 13
  • Okay. I see what you mean. This exercise is my first foray into web programming. I'm mostly familiar with connecting to databases, and this is nothing like that. Seems like it's more trouble than it's worth. – DGarrett01 Sep 10 '14 at 03:14
  • 3
    Selenium is all I needed. It made incredibly short work of my problem. – minnow Jan 18 '15 at 20:39
  • 10
    Worked great for me logging into azure to get credits. It was missing the CookieContainer cc = new CookieContainer(); though – MrBeanzy Oct 21 '15 at 07:55
3

Checkout this post. It's another way of doing it and you don't need to install any package although it might be easier with Selenium.

"You can continue using WebClient to POST (instead of GET, which is the HTTP verb you're currently using with DownloadString), but I think you'll find it easier to work with the (slightly) lower-level classes WebRequest and WebResponse.

There are two parts to this - the first is to post the login form, the second is recovering the "Set-cookie" header and sending that back to the server as "Cookie" along with your GET request. The server will use this cookie to identify you from now on (assuming it's using cookie-based authentication which I'm fairly confident it is as that page returns a Set-cookie header which includes "PHPSESSID").


POSTing to the login form

Form posts are easy to simulate, it's just a case of formatting your post data as follows:

field1=value1&field2=value2

Using WebRequest and code I adapted from Scott Hanselman, here's how you'd POST form data to your login form:

string formUrl = "http://www.mmoinn.com/index.do?PageModule=UsersAction&Action=UsersLogin";

NOTE: This is the URL the form POSTs to, not the URL of the form (you can find this in the "action" attribute of the HTML's form tag

string formParams = string.Format("email_address={0}&password={1}", "your email", "your password");
string cookieHeader;
WebRequest req = WebRequest.Create(formUrl);
req.ContentType = "application/x-www-form-urlencoded";
req.Method = "POST";
byte[] bytes = Encoding.ASCII.GetBytes(formParams);
req.ContentLength = bytes.Length;
using (Stream os = req.GetRequestStream())
{
    os.Write(bytes, 0, bytes.Length);
}
WebResponse resp = req.GetResponse();
cookieHeader = resp.Headers["Set-cookie"];

Here's an example of what you should see in the Set-cookie header for your login form:

PHPSESSID=c4812cffcf2c45e0357a5a93c137642e; path=/; domain=.mmoinn.com,wowmine_referer=directenter; path=/;

domain=.mmoinn.com,lang=en; path=/;domain=.mmoinn.com,adt_usertype=other,adt_host=-


GETting the page behind the login form

Now you can perform your GET request to a page that you need to be logged in for.

string pageSource;
string getUrl = "the url of the page behind the login";
WebRequest getRequest = WebRequest.Create(getUrl);
getRequest.Headers.Add("Cookie", cookieHeader);
WebResponse getResponse = getRequest.GetResponse();
using (StreamReader sr = new StreamReader(getResponse.GetResponseStream()))
{
    pageSource = sr.ReadToEnd();
}

EDIT:

If you need to view the results of the first POST, you can recover the HTML it returned with:

using (StreamReader sr = new StreamReader(resp.GetResponseStream()))
{
    pageSource = sr.ReadToEnd();
}

Place this directly below cookieHeader = resp.Headers["Set-cookie"]; and then inspect the string held in pageSource."

Community
  • 1
  • 1
DFSFOT
  • 536
  • 3
  • 19
  • 1
    Just copy-pasting the entirety of somebody else's answer isn't appropriate behaviour either (even if you link to it). – nkjt May 08 '15 at 15:01
  • 1
    @nkjt There's no way I can explain it better than him but still wanted to help people who get on this page... – DFSFOT May 08 '15 at 15:03
  • 1
    Please [edit] to use a quote block (select the text and use the `"` button while editing) to show readers that _all_ of the content of your post was written by someone else. – Edward May 08 '15 at 15:09
  • FYI this is what I described as the "fragile script dance", in essence a choreography of http requests. But that script will break as soon as the site changes it's login form/flow. And it gets even more complicated with HTTPS which that post doesn't address. My advice was (and still is), that instead of programming around the application layer at the network layer, just use selenium to "program" at the application layer instead. – xavier May 10 '15 at 02:50
  • @xavier Selenium is kinda annoying as ppl need to have the browser you code for installed and the browser actually pops up and does stuff which takes a lil more time... – DFSFOT May 11 '15 at 13:52
  • Nice explanation of some of the mechanics in WebRequest, but this really shouldn't work for any properly implemented login form. It assumes the site has no defense against cross-site request forgery. For example Asp.net web forms pages will require valid view state tokens, while MVC requires matching cookie and form validation tokens. Sites that just let you POST away are insecure. – Matthew Dec 08 '15 at 17:42