9

Suppose I have a program A. I run it, and performs some operation starting from a file foo.txt. Now A terminates.

New run of A. It checks if the file foo.txt has changed. If the file has changed, A runs its operation again, otherwise, it quits.

Does a library function/external library for this exists ?

Of course it can be implemented with an md5 + a file/db containing the md5. I want to prevent reinventing the wheel.

Charles Sprayberry
  • 7,741
  • 3
  • 41
  • 50
Stefano Borini
  • 138,652
  • 96
  • 297
  • 431

4 Answers4

10

It's unlikely that someone made a library for something so simple. Solution in 13 lines:

import pickle
import md5
try:
    l = pickle.load(open("db"))
except IOError:
    l = []
db = dict(l)
path = "/etc/hosts"
checksum = md5.md5(open(path).read())
if db.get(path, None) != checksum:
    print "file changed"
    db[path] = checksum
pickle.dump(db.items(), open("db", "w")
Sufian
  • 8,627
  • 4
  • 22
  • 24
  • 4
    It would probably be worthwhile first checking st_mtime and st_size: if they've changed, you don't need to checksum, saving time. –  Dec 16 '09 at 06:05
  • 1
    A number of things could be done to make this as configurable/one-size-fits-all of a solution as you'd like. My point is simply that it's an easy problem, and it will take longer to look for and configure a general case library than to roll your own. – Sufian Dec 16 '09 at 06:12
  • 2
    There are many simple functionalities in the standard library that are solved with a few lines of code, but there they are :) Thanks for the code! – Stefano Borini Dec 16 '09 at 06:13
  • Hi, so I got `TypeError: 'builtin_function_or_method' object is not iterable` in line `db = dict(l)`. When I printed `l`, I got ``. Any way to fix this? – CrazyVideoGamer Jun 20 '23 at 02:59
6

FYI - for those using this example who got this error: "TypeError: can't pickle HASH objects" Simply modify the following (optionally update md5 to hashlib, md5 is deprecated):

    import pickle
    import hashlib #instead of md5
    try:
        l = pickle.load(open("db"))
    except IOError:
        l = []
    db = dict(l)
    path = "/etc/hosts"
    #this converts the hash to text
    checksum = hashlib.md5(open(path).read()).hexdigest() 
    if db.get(path, None) != checksum:
        print "file changed"
        db[path] = checksum
    pickle.dump(db.items(), open("db", "w"))

so just change:

    checksum = hashlib.md5(open(path).read())

to

    checksum = hashlib.md5(open(path).read()).hexdigest()
T.C.
  • 133,968
  • 17
  • 288
  • 421
James Nelson
  • 833
  • 10
  • 15
2

This is one of those things that is both so trivial to implement and so app-specific that there really wouldn't be any point in a library, and any library intended for this purpose would grow so unwieldy trying to adapt to the many variations required, learning and using the library would take as much time as implementing it yourself.

Nicholas Knight
  • 15,774
  • 5
  • 45
  • 57
0

Cant we just check the last modified date . i.e after the first operation we store the last modified date in the db , and then before running again we compare the last modified date of the file foo.txt with the value stored in our db .. if they differ ,we perform the operation again ?

NM.
  • 1,909
  • 1
  • 13
  • 21
  • That's what make does, and I frankly prefer not to. – Stefano Borini Dec 16 '09 at 05:50
  • What is the problem using modification time? –  Dec 16 '09 at 06:03
  • suppose the file is downloaded every hour from a remote website, or generated from any source that actually recreates the file and it is beyond my control. The modification time will change, but if the actual content is the same, there's no point in re-executing the task. – Stefano Borini Dec 16 '09 at 06:30
  • Of course you can workaround it (for example, write to a temporary file, and then overwrite only if changed, after md5 comparison of the two). I agree there are other solutions. – Stefano Borini Dec 16 '09 at 06:34
  • some types of files can also change contents without the size or last modif date changing... this is the case in particular with TrueCrypt encryption files... SyncBack acknowledges this: you can opt to "check whether contents have modified by a more dependable (but slower) method..." – mike rodent Mar 23 '14 at 08:10