C# How to download a webpage or URI only if it is under size N?

Question

I wanted to see if there’s a way in C# to only download the html of the webpage if the web page if under N bytes in size? We’d like to store the output of pages with certain status codes, but only if the HTML on the web page is less than N bytes.

what have you tried? such as looking at the content-length header before downloading the whole file. — Can Poyrazoğlu, Jan 30 '14 at 22:59
May be this link can help . http://stackoverflow.com/questions/3741814/get-http-file-size-download-location-and-url-in-a-label — pa1geek, Jan 30 '14 at 23:01
In many cases this is fundamentally not possible unless you read enough to know that the response is going to be larger than your limit (e.g. when `Transfer-Encoding: Chunked`). On the other end of the spectrum, a `HEAD` request might be enough if the server returns "cooperative" headers. — Jon, Jan 30 '14 at 23:02

score 6 · Accepted Answer · edited May 23 '17 at 12:21

Using HttpWebRequest and Method = "HEAD" you will be able to get page header information and it will not load the whole page, which is much faster. After you get size of the page you can decide if you would like to load a page or not, where you can use WebClient for it

Like Jon pointed out that content length might not be present and in this case -1 will be returned. If that the case you will need to get full page and check page size from there.

void Main()
{
    const long PageSizeLimit = 1000000;
    var url = "http://www.stackoverflow.com";
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
    request.Method = "HEAD";
    long pageSize;
    string page;

    using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
    {
        pageSize = response.ContentLength;
    }

    // if content lenth is not present -> get full page
    if (pageSize > 0 && pageSize < PageSizeLimit)
    {
        page = DownloadPage(url);
        ProcessPage(page);
    }
    else
    {
        page = DownloadPage(url);

        if (page.Length < PageSizeLimit)
        {
            ProcessPage(page);
        }
    }
}

public string DownloadPage(string url)
{
    using (var webClient = new WebClient())
    {           
        return webClient.DownloadString(url);
    }
}

public void ProcessPage(string page)
{
    // do your processing
}

Note that this will only work at the pleasure of the server: ["If the Content-Length header is not set in the response, `ContentLength` is set to the value -1."](http://msdn.microsoft.com/en-us/library/system.net.httpwebresponse.contentlength(v=vs.110).aspx), so it's not a guaranteed solution by any means. — Jon, Jan 30 '14 at 23:30

score 0 · Answer 2 · answered Jan 30 '14 at 23:09

0

Please use the following method to find the size and error details.

WebClient client = new WebClient();            
byte[] data=client.DownloadData(new System.Uri("http://www.google.com"));

You can use the data value to find memory size based on byte[]

answered Jan 30 '14 at 23:09

Thanigainathan

1,505
14
25

That will have already downloaded the page, though. The OP doesn't want to download the page if it's over a certain size. – Jedidja Jan 31 '14 at 00:40

C# How to download a webpage or URI only if it is under size N?

2 Answers2