0

I have a site that returns a custom 404 page, I need to get the source code of it and determine what kind of 404 it's returning. Is there a way to get the source code of the 404 page?

try
{
    using (var webClient = new WebClient())
    {
        webClient.Headers.Add("User-Agent", "Mozilla/4.0 (compatible; MSIE 8.0)");
        webClient.DownloadFile(new Uri(file.Address), file.SaveLocation);
    }
}
catch (WebException e)
{
    // read source code here...
}
chaotic
  • 45
  • 4
  • `404` means the end point was not found. Not sure how you want to read the source of something that does not exist. – Igor May 11 '18 at 20:46
  • 2
    404 can return a custom error page, the error code (status code) means the file isn't found, not the page necessarily. – chaotic May 11 '18 at 20:46
  • Do you want to get the "page", as in the 404 page? Literally meaning, the page that your server provides given a "not found" error? Why not just open it in the browse and hit "view source" out of the context menu? Or if you're on Linux, use curl or wget? – Kamil Jarosz May 11 '18 at 20:47
  • 1
    I need to do this in C# for a generic URL. I need to get the content of the page, the 404 error page. – chaotic May 11 '18 at 20:48
  • 3
    The MSDN page says that WebException has a Response and Source property (Nope, "source" is not "source code" in this case). Maybe one of those is what you're looking for? Have you tried either one of them? https://msdn.microsoft.com/en-us/library/system.net.webexception(v=vs.110).aspx – Kamil Jarosz May 11 '18 at 20:51
  • 1
    Do you mean by 404 page a "soft 404' page? such as this one? https://www.gstatic.com/images/icons/ ,i.e. it returns a page content with 404 header. – David Horák May 11 '18 at 20:51
  • Yes, that's what I mean. – chaotic May 11 '18 at 20:53

1 Answers1

1

This is a solution (tested), which in all fairness @KamilJarosz hinted in a comment to your question:

...
catch (WebException e)
{
    if (e.Response != null && (e.Response as HttpWebResponse).StatusCode == HttpStatusCode.NotFound)
    {
        var Html404Page = new StreamReader(e.Response.GetResponseStream()).ReadToEnd().ToString();
    }
}

Of course I thought you wanted a string, so I adapted the answer to this question.

EDIT

I also added a guard clause, to prevent further problems if the Response is null and processing if the response is not a 404 one.

Francesco B.
  • 2,729
  • 4
  • 25
  • 37