I was going to leave this as a comment to @Dyppl's response, but it became too long.
Dyppl's response is generally good advice, and the way that I would approach this problem. However, there are a few things you should keep in mind.
First, there's no reason to do a HEAD
request, followed by a GET
if the page has been modified. You can do a GET
with the IfModifiedSince
header set, and the server will either return the entire page or a 304. Doing the HEAD
first, followed by the 'GET`, ends up making two requests to the server, which defeats much of the purpose of the conditional request.
Second, you should set the IfModifiedSince
property to the LastModified
value returned by the previous response (i.e. HttpWebResponse.LastModified
) because the server's time might not be synchronized with your computer. Also, I've found that a large percentage of sites, particularly those with generated content (like WordPress blogs) lie. They always return the current date/time in the LastModified
header. As a result, there is no benefit to doing the If-Modified-Since
check on those sites.
If you know that the site lies and always returns the current date/time, you can keep track of the ContentLength
header that's returned from the page when you download it. Then, when you want to check to see if the page has changed, do a HEAD
request and check the returned ContentLength
header with the saved value. If they match, then it's unlikely that the page has changed. If they don't match, then do a GET
request to update your copy of the page and keep the new ContentLength
.
This technique does have the disadvantage of requiring two requests if the page has changed. It's also not 100% reliable on all servers. Some will return a different ContentLength
for the HEAD
request, and some don't return a valid ContentLength
at all. That said, I've found it to be effective for a large number of sites.