1

I'm trying to create a method that allows me to download files from a server (using HTTP) only if the file is newest than the version already downloaded to my computer.

I found a way to get the last time that the file was modified on the server (at least the last time that the server thinks that it was modified):

u = urllib2.urlopen(url)
meta = u.info()
print("Last Modified: " + str(meta.getheaders("Last-Modified")))

The problem now is how to use this information to compare with the files that I already have on my computer and see if that version located on the server is newer than the version saved on my computer.

I tried to use python-wget library; however, it didn't help. It is downloading everything and not even is overwriting the files (it is creating new ones), so I realized that that library doesn't check the timestamp.

What is the best way to solve that?

Alexandre Lara
  • 2,464
  • 1
  • 28
  • 35
  • Does using `wget` by itself work for you? Check out this post on using wget and timestamping: https://stackoverflow.com/questions/3423473/wget-checking-for-file-timestamp-and-overwriting – serk Jun 29 '15 at 00:40
  • @serk It would probably work; however, I need to use it on a Python script, so it would be better if I could use Python itself. – Alexandre Lara Jun 29 '15 at 13:11

3 Answers3

6

Consider the os.path.getmtime to obtain computer file's modified date.

But you need to convert the url header's modified time to timestamp for comparison between computer and server files:

import os, datetime, time

u = urllib2.urlopen(url)
meta = u.info()
print("Last Modified: " + str(meta.getheaders("Last-Modified")))

# CONVERTING HEADER TIME TO UTC TIMESTAMP 
# ASSUMING 'Sun, 28 Jun 2015 06:30:17 GMT' FORMAT
meta_modifiedtime = time.mktime(datetime.datetime.strptime( \
                    meta.getheaders("Last-Modified"), "%a, %d %b %Y %X GMT").timetuple())

file = 'C:\Path\ToFile\somefile.xml'
if os.path.getmtime(file) > meta_modifiedtime:
   print("CPU file is older than server file.")
else:
   print("CPU file is NOT older than server file.")
Community
  • 1
  • 1
Parfait
  • 104,375
  • 17
  • 94
  • 125
1

I don't have enough reputation yet to add a comment to an answer so I just added my own answer.

I am using Python 2.7 and had to modify the answer given by @parfait to get this to work.

1) I had to convert the list you get from meta.getheaders("Last-Modified") to a string.

2) When you convert a time to seconds, a higher number of seconds will be a newer date, since more seconds means more time has passed since some date. So I also changed the > to < in the if statement.

The result is as follows:

import os, datetime, time

u = urllib2.urlopen(url)
meta = u.info()
print("Last Modified: " + str(meta.getheaders("Last-Modified")))

# CONVERTING HEADER TIME TO UTC TIMESTAMP 
# ASSUMING 'Sun, 28 Jun 2015 06:30:17 GMT' FORMAT
# Remember datetime.datetime.strptime() takes a string as the first param...
meta_modifiedtime = time.mktime(datetime.datetime.strptime( \
                    ''.join(meta.getheaders("Last-Modified")), "%a, %d %b %Y %X GMT").timetuple())

file = 'C:\Path\ToFile\somefile.xml'
if os.path.getmtime(file) < meta_modifiedtime: #change > to <
   print("CPU file is older than server file.")
else:
   print("CPU file is NOT older than server file.")
Gerald Murphy
  • 394
  • 1
  • 3
  • 19
0

Comparing files based on last-modified is not the best way to do this but since you asked...

from __future__ import print_function
import requests
import os.path
import time
import shutil

url = 'https://www.google.com/images/srpr/logo11w.png'
file = 'logo11w.png'
r = requests.get(url)

meta = r.headers['last-modified']
print("Web  Last Modified: {0}".format(meta))

filetime = (time.strftime('%a, %d %b %Y %X GMT', time.gmtime(os.path.getmtime(file))))
print("File Last Modified: {0}".format(filetime))

if filetime > meta:
    print("Newer file found! Downloading...")
    f = requests.get(url, stream=True)
    with open ('logo11w.png', 'wb') as out_file:
        shutil.copyfileobj(response.raw,out_file)
    del response
else:
    print('No new version found. You got the latest file!')
serk
  • 4,329
  • 2
  • 25
  • 38