3

I am working on creating an Internet Explorer add on using BandOjects and C# Windows Forms Application, and am testing out parsing HTML source code. I have been currently parsing information based on the URL of the site.

I would like to get HTML source of the current page of an example site I have that uses a login. if I use the URL of the page I am on, it will always grab the source of the login page rather than the actual page, as my app doesn't recognize that I logged in. would i need to store my login credentials for the site using some kind of api? or is there a way to grab the current page of the HTML regardless? I would prefer the latter as it seemingly would be less trouble. Thanks!

Drew
  • 2,601
  • 6
  • 42
  • 65

1 Answers1

4

I use this method in one of my apps:

private static string RetrieveData(string url)
    {

        // used to build entire input
        var sb = new StringBuilder();

        // used on each read operation
        var buf = new byte[8192];
        try
        {
            // prepare the web page we will be asking for
            var request = (HttpWebRequest)
                                     WebRequest.Create(url);

           /* Using the proxy class to access the site
            * Uri proxyURI = new Uri("http://proxy.com:80");
            request.Proxy = new WebProxy(proxyURI);
            request.Proxy.Credentials = new NetworkCredential("proxyuser", "proxypassword");*/

            // execute the request
            var response = (HttpWebResponse)
                                       request.GetResponse();

            // we will read data via the response stream
            Stream resStream = response.GetResponseStream();

            string tempString = null;
            int count = 0;

            do
            {
                // fill the buffer with data
                count = resStream.Read(buf, 0, buf.Length);

                // make sure we read some data
                if (count != 0)
                {
                    // translate from bytes to ASCII text
                    tempString = Encoding.ASCII.GetString(buf, 0, count);

                    // continue building the string
                    sb.Append(tempString);
                }
            } while (count > 0); // any more data to read?

        }
        catch(Exception exception)
        {
            MessageBox.Show(@"Failed to retrieve data from the network. Please check you internet connection: " +
                            exception);
        }
        return sb.ToString();
    }

You have to just pass the url of the web page for which you need to retrieve the code.

For example:

string htmlSourceGoggle = RetrieveData("www.google.com") 

Note: You can get un-comment the proxy configuration if you use proxy to access the internet. Replace the proxy address, username and password with the one you use.

For logging in via code. check this: Login to website, via C#

Community
  • 1
  • 1
reggie
  • 13,313
  • 13
  • 41
  • 57
  • thanks so much, this does work for getting source based on the URL (which i did have working initially). but again because my site requires a login to view the specific page (say for instance a page that has an id in the query string designating what page it is), it always retrieves the source of the login page, because if you tried to go to that page just on the url without logging in, it would not let you. Not sure what to do about this or if there is even anything I can do. – Drew Oct 20 '11 at 16:33
  • http://stackoverflow.com/questions/930807/c-sharp-login-to-website-via-program/931030#931030 – reggie Oct 20 '11 at 19:12
  • still working with the example in the link to make it work for me, but that's essentially what i was looking for. thank you! – Drew Oct 20 '11 at 19:43