1

So I am currently trying to log into my account on a website using WebRequest. I have been reading about it to the point where I feel like I wanted to use an example to learn by trial and error.

This is the example I am using Login to website, via C#

So when I try to execute my code it returns an unhandled exception and its this one

System.Net.WebException: 'The remote server returned an error: (404) Not Found.'

I tried stepping through the code and I THINK it might be that it's trying to POST somewhere where it can't. I wanted to fix this before moving onto getting a confirmation that it successfully logged in. I changed the username and password to dummy text for the sake of this question.

What did I do wrong here and whats the most logical way of fixing this issue? Thanks in advance.

ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;

string formUrl = "https://secure.runescape.com/m=weblogin/login.ws"; // NOTE: This is the URL the form POSTs to, not the URL of the form (you can find this in the "action" attribute of the HTML's form tag
string formParams = string.Format("login-username={0}&login-password={1}", "myUsername", "password");
string cookieHeader;
WebRequest req = WebRequest.Create(formUrl);
req.ContentType = "application/x-www-form-urlencoded";
req.Method = "POST";
byte[] bytes = Encoding.ASCII.GetBytes(formParams);
req.ContentLength = bytes.Length;
using (Stream os = req.GetRequestStream())
{
    os.Write(bytes, 0, bytes.Length);
}
WebResponse resp = req.GetResponse();

cookieHeader = resp.Headers["Set-cookie"];
moondaisy
  • 4,303
  • 6
  • 41
  • 70
Jordan Jones
  • 125
  • 10

1 Answers1

0

When you scrape a website, you have to make sure you mimic everything that happens. That includes any client-side state (Cookies) that is sent earlier before a form is POST-ed. As most sites don't like to be scraped or steered by bots they are often rather picky about what is the payload. Same is true for the site you're trying to control.

Three important things you have missed:

  • You didn't start with an initial GET so you have the required cookies in a CookieContainer.
  • on the post you missed an header (Referrer) and three hidden fields in the form.
  • The form fields are named username and password (as can be seen in the name attribute of the input tags). You have used the id's.

Fixing those omissions will result in the following code:

ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
string useragent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36";

// capture cookies, this is important!
var cookies = new CookieContainer();

// do a GET first, so you have the initial cookies neeeded
string loginUrl = "https://secure.runescape.com/m=weblogin/loginform.ws?mod=www&ssl=0&dest=community";
// HttpWebRequest
var reqLogin = (HttpWebRequest) WebRequest.Create(loginUrl);
// minimal needed settings
reqLogin.UserAgent = useragent;
reqLogin.CookieContainer = cookies;

reqLogin.Method = "GET";
var loginResp = reqLogin.GetResponse();
//loginResp.Dump(); // LinqPad testing

string formUrl = "https://secure.runescape.com/m=weblogin/login.ws"; // NOTE: This is the URL the form POSTs to, not the URL of the form (you can find this in the "action" attribute of the HTML's form tag
// in ther html the form has 3 more hidden fields, those are needed as well
string formParams = string.Format("username={0}&password={1}&mod=www&ssl=0&dest=community", "myUsername", "password");
string cookieHeader;
// notice the cast to HttpWebRequest
var req = (HttpWebRequest) WebRequest.Create(formUrl);

// put the earlier cookies back on the request
req.CookieContainer = cookies;

// the Referrer is mandatory, without it a timeout is raised
req.Headers["Referrer"] = "https://secure.runescape.com/m=weblogin/loginform.ws?mod=www&ssl=0&dest=community";
req.UserAgent = useragent;

req.ContentType = "application/x-www-form-urlencoded";
req.Method = "POST";
byte[] bytes = Encoding.ASCII.GetBytes(formParams);
req.ContentLength = bytes.Length;
using (Stream os = req.GetRequestStream())
{
    os.Write(bytes, 0, bytes.Length);
}
WebResponse resp = req.GetResponse();

cookieHeader = resp.Headers["Set-cookie"];

This returns for me success. It is up to you parse the resulting HTML to plan your next steps.

rene
  • 41,474
  • 78
  • 114
  • 152
  • Thank you so much for commenting the code aswell, its going to make it 100x easier for me to learn. What im thinking now is that I can use the resp to check whether it contains something like successfully logged in ro something. – Jordan Jones Aug 05 '17 at 08:43
  • you said that this returned a success for you? I've stepped through the code multiple times, tried changing a few things and it still returns a "incorrect name or password" even though its 100% correct, any thoughts? – Jordan Jones Aug 05 '17 at 19:09
  • I don't have a valid username/password so I didn't check that. I fixed your initial error. The rest of scraping is up to you but if you open the Dev Console in Chrome you can follow what happens in the browser and then need to mimick that with WebRequests. Not easy, but doable. – rene Aug 05 '17 at 19:13
  • I think I know whats happening, it finds the username box and password but there is only 1 value attribute and its for the username. the password one looks like this `` as you can see there is no "value" attribute as there is in the username one, so I think it tries logging in without a password – Jordan Jones Aug 05 '17 at 19:18
  • @JordanJones yep you used the wrong names for the fields. The values that are in the name attribute go on the wire, not the id's. Now fixed. – rene Aug 05 '17 at 19:24
  • How did you find the formParams? Where did you get them from? o: – Jordan Jones Aug 07 '17 at 22:25
  • @JordanJones it is all in the html of that login page, there is no magic, just some Sherlock Holmes skills. – rene Aug 08 '17 at 06:00