1

So basically I am looking to grab HTML data from a webpage - the issue is that to access this page one needs to log in. I am already logged in on a browser (IE) but I believe my code doesn't reference the same browser and that's why it requires a log in.

This is what I did so far:

        public void HTMLImport(){
        string urlAddress = "https://randomWebsite.com/reports/show_report.aspx";

        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();

        if(response.StatusCode == HttpStatusCode.OK){

            Stream receiveStream = response.GetResponseStream();
            StreamReader readStream = null;

            if(response.CharacterSet == null){
                readStream = new StreamReader(receiveStream);
            }
            else{
                readStream = new StreamReader(receiveStream,Encoding.GetEncoding(response.CharacterSet));
            }
            string data = readStream.ReadToEnd();

            response.Close();
            readStream.Close();

            // This is showing the HTML data for when person is not logged in - 
            Console.WriteLine(data);



        }
  • http://www.developerfusion.com/project/98472/nsoup/ – Susheel Singh Oct 20 '15 at 03:01
  • Do you mean you are logged in to IE and then running application in other browser? If so, this is expected behavior. – Nikhil Vartak Oct 20 '15 at 03:02
  • @SusheelSingh OP is not asking "How to parse HTML?" – Nikhil Vartak Oct 20 '15 at 03:03
  • 1
    http://stackoverflow.com/questions/1453560/c-sharp-keep-session-id-over-httpwebrequest – Eric J. Oct 20 '15 at 03:03
  • 1
    do you know what kind of authentication the webpage is required? once you figure out, just need to populate the right kind of authentication properties in header you should be good to go. (http://www.telerik.com/fiddler will be your good friend here) – Xiaomin Wu Oct 20 '15 at 03:03
  • I am at the page where I want to be at in IE- now I just need my code to grab the HTML data and store it in a variable –  Oct 20 '15 at 03:05

2 Answers2

0

You will need to perform a login from your C# code, maybe by posting the login form back to the server with correct credentials (too long to write the code here) and then reading the response page back for the session cookie (most login functions will reply with an authentication cookie that you must include in further requests).

without more details, unfortunately I cannot help more.

Bishoy
  • 3,915
  • 29
  • 37
0

To do this, first of all, you should known that the website usually use the cookie to hold the session.

  1. Send a request to the webserver and get the response, you will find a session_id in the response HEAD. (in .NET usually use ASP.NET_SessionId) .
  2. Send a login request to the webserver and post the username and password, you should add the ASP.NET_SessionId cookie in this request and the following request.
  3. Send the "https://randomWebsite.com/reports/show_report.aspx" with the ASP.NET_SessionId cookie, you will find you are logined in the web server.
Irshad
  • 3,071
  • 5
  • 30
  • 51
theone
  • 1