3

Any way to get the html of a webpage even when the header is set to 404? Some pages still have text on them, and in my case I need to read that text.

Example C# code for getting HTML:

 public static string GetHtmlFromUri(string resource)
        {
            string html = string.Empty;
            HttpWebRequest req = (HttpWebRequest)WebRequest.Create(resource); //Errors here.
            using (HttpWebResponse resp = (HttpWebResponse)req.GetResponse())
            {
                bool isSuccess = (int)resp.StatusCode < 299 && (int)resp.StatusCode >= 200;
                if (isSuccess)
                {
                    using (StreamReader reader = new StreamReader(resp.GetResponseStream()))
                    {
                        html = reader.ReadToEnd();
                    }
                }
            }
            return html;
        }

And here is a page that i've created to test this with 404 errors: http://bypass.rd.to/headertest.php
If you look in the header, you will see that it is a 404, but text can be read. Now try to get the page in C#...

MessageBox.Show(GetHtmlFromUri("http://bypass.rd.to/headertest.php"));

System.Net.WebException was unhandled
Message="The remote server returned an error: (404) Not Found."
Source="System"
StackTrace: at System.Net.HttpWebRequest.GetResponse()

E3pO
  • 493
  • 1
  • 9
  • 21

1 Answers1

4

The exception contains the HttpWebResponse from which you can access everything that was sent back. See this answer for an example.

Community
  • 1
  • 1
GraemeF
  • 11,327
  • 5
  • 52
  • 76