I wanted to see if there’s a way in C# to only download the html of the webpage if the web page if under N bytes in size? We’d like to store the output of pages with certain status codes, but only if the HTML on the web page is less than N bytes.
Asked
Active
Viewed 149 times
0
-
2what have you tried? such as looking at the content-length header before downloading the whole file. – Can Poyrazoğlu Jan 30 '14 at 22:59
-
May be this link can help . http://stackoverflow.com/questions/3741814/get-http-file-size-download-location-and-url-in-a-label – pa1geek Jan 30 '14 at 23:01
-
In many cases this is fundamentally not possible unless you read enough to know that the response is going to be larger than your limit (e.g. when `Transfer-Encoding: Chunked`). On the other end of the spectrum, a `HEAD` request might be enough if the server returns "cooperative" headers. – Jon Jan 30 '14 at 23:02
2 Answers
6
Using HttpWebRequest and Method = "HEAD" you will be able to get page header information and it will not load the whole page, which is much faster. After you get size of the page you can decide if you would like to load a page or not, where you can use WebClient for it
Like Jon pointed out that content length might not be present and in this case -1 will be returned. If that the case you will need to get full page and check page size from there.
void Main()
{
const long PageSizeLimit = 1000000;
var url = "http://www.stackoverflow.com";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "HEAD";
long pageSize;
string page;
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
pageSize = response.ContentLength;
}
// if content lenth is not present -> get full page
if (pageSize > 0 && pageSize < PageSizeLimit)
{
page = DownloadPage(url);
ProcessPage(page);
}
else
{
page = DownloadPage(url);
if (page.Length < PageSizeLimit)
{
ProcessPage(page);
}
}
}
public string DownloadPage(string url)
{
using (var webClient = new WebClient())
{
return webClient.DownloadString(url);
}
}
public void ProcessPage(string page)
{
// do your processing
}

Community
- 1
- 1

Vlad Bezden
- 83,883
- 25
- 248
- 179
-
Note that this will only work at the pleasure of the server: ["If the Content-Length header is not set in the response, `ContentLength` is set to the value -1."](http://msdn.microsoft.com/en-us/library/system.net.httpwebresponse.contentlength(v=vs.110).aspx), so it's not a guaranteed solution by any means. – Jon Jan 30 '14 at 23:30
0
Please use the following method to find the size and error details.
WebClient client = new WebClient();
byte[] data=client.DownloadData(new System.Uri("http://www.google.com"));
You can use the data value to find memory size based on byte[]

Thanigainathan
- 1,505
- 14
- 25
-
That will have already downloaded the page, though. The OP doesn't want to download the page if it's over a certain size. – Jedidja Jan 31 '14 at 00:40