1

I have a website from which i want to download files. A new file is uploaded about every other day. How can i check if a new file is up or not?

Ex: url1 = website.com/file_2013-06-27.zip <-- uploaded

url2 = website.com/file_2013-06-29.zip <-- not uploaded

if i go to url 2, in 5 seconds it redirects back to website.com the source code of it is: <meta http-equiv="refresh" content="5;url=http://website.com" /> Error: 2 [ Not Allowed ]

The size of the files are 100mb+ and if i try to look at the source by doing urllib.urlopen("website.com/file_2013-06-27.zip").read(), it takes a while if the file exists.

Whats a quick way to check if a new file was uploaded?

Thanks

Vaibhav Aggarwal
  • 1,381
  • 2
  • 19
  • 30
  • 4
    If the server supports this, you can issue a HEAD request; if it doesn't, just issue a GET request, and read through the socket only the headers (that is, all that which precedes the first blank line). – michaelmeyer Jul 01 '13 at 22:18
  • Thanks, i used: `import httplib2 h = httplib2.Http() resp = h.request("http://www.google.com", 'HEAD')[0]['content-type']` from:http://stackoverflow.com/questions/4421170/python-head-request-with-urllib2 – Vaibhav Aggarwal Jul 01 '13 at 22:38

1 Answers1

1

Python's Requests library is great for checking things like HTTP status codes (not downloading files, just getting the response)

For example:

import requests

r = requests.get('website.com/file_2013-06-27.zip')
if r.status_code == 200:
    print ("File uploaded.")

That doesn't download the file (just tried it with a 1GB file), just checks if the web server will serve it and what the HTTP response is. With HTTP, 200 means that the file exists, and is accessible. See below for more info on HTTP response codes.

More info:
http://docs.python-requests.org/en/latest/ - the requests library
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html - guide to HTTP response codes

Charles Newey
  • 345
  • 1
  • 3
  • 10
  • my problem was solved with the comment, below my question. But in response to your answer, i had tried that before posting my question on my website and it responded 200 even if it wasnt there because it just redirected you. So that wouldn't work. Thanks though. – Vaibhav Aggarwal Jul 01 '13 at 23:42