4

I am trying to learn python-watchdog, but I am sort of confused why the job I set up runs more than once. So, here is my set up:

#handler.py
import os
from watchdog.events import FileSystemEventHandler
from actions import run_something

def getext(filename):
    return os.path.splitext(filename)[-1].lower()

class ChangeHandler(FileSystemEventHandler):

    def on_any_event(self, event):

        if event.is_directory:
            return
        if getext(event.src_path) == '.done':
            run_something()
        else: 
            print "event not directory.. exiting..."
            pass

the observer is set up like so:

#observer.py
import os
import time
from watchdog.observers import Observer
from handler import ChangeHandler

BASEDIR = "/path/to/some/directory/bin"

def main():

    while 1:

        event_handler = ChangeHandler()
        observer = Observer()
        observer.schedule(event_handler, BASEDIR, recursive=True)
        observer.start()
        try:
            while True:
                time.sleep(1)
        except KeyboardInterrupt:
            observer.stop()
        observer.join()

 if __name__ == '__main__':
    main()

and finally, the actions like so:

#actions.py
import os
import subprocess

def run_something():
    output = subprocess.check_output(['./run.sh'])
    print output
    return None

..where ./run.sh is just a shell script I would like to run when a file with an extension .done is found on /path/to/some/directory/bin

#run.sh
#!/bin/bash
echo "Job Start: $(date)"
rm -rf /path/to/some/directory/bin/job.done # remove the .done file
echo "Job Done: $(date)"

However, when I issue a python observer.py and then do a touch job.done on /path/to/some/directory/bin, I see that my shell script ./run.sh runs three times and not one..

I am confused why this runs thrice and not just once (I do delete the job.done file on my bash script)

JohnJ
  • 6,736
  • 13
  • 49
  • 82
  • 1
    ‘While 1‘ and ‘While true‘ are bad code and im guessing in your code it isnt even necessary since ‘join‘ calls usually block until a condition is met. Your code will block indefinitely at some point, creating a zombie process which wastes system resources – specializt Dec 24 '14 at 18:12
  • Actually, I took the code straight out of a tutorial: http://ginstrom.com/scribbles/2012/05/10/continuous-integration-in-python-using-watchdog/ I have now deleted both `‘While 1‘ and ‘While true‘` from the code. Thanks again for the tip. – JohnJ Dec 24 '14 at 19:31

2 Answers2

6

To debug watchdog scripts, it is useful to print what watchdog is seeing as events. One file edit or CLI command, such as touch, can result in multiple watchdog events. For example, if you insert a print statement:

class ChangeHandler(FileSystemEventHandler):

    def on_any_event(self, event):
        print(event)

to log every event, running

% touch job.done

generates

2014-12-24 13:11:02 - <FileCreatedEvent: src_path='/home/unutbu/tmp/job.done'>
2014-12-24 13:11:02 - <DirModifiedEvent: src_path='/home/unutbu/tmp'>
2014-12-24 13:11:02 - <FileModifiedEvent: src_path='/home/unutbu/tmp/job.done'>

Above there were two events with src_path ending in job.done. Thus,

    if getext(event.src_path) == '.done':
        run_something()

runs twice because there is a FileCreatedEvent and a FileModifiedEvent. You might be better off only monitoring FileModifiedEvents.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Thats simply awesome - thanks a lot for this - I now use `on_created` to just monitor newly created files (since I delete the `job.done` on my bash script anyway). Works as expected! accepted your answer :) – JohnJ Dec 24 '14 at 19:30
  • Do you think I am better off looking for `FileCreatedEvent` rather than `FileModifiedEvent`? – JohnJ Dec 24 '14 at 19:34
  • 1
    Well, I don't think it matters in this case. I suggested `FileModifiedEvent` only because it happens last, so you know whatever has been written to the file has been written. In this case, it sounds like you are not reading the file, so it does not matter. – unutbu Dec 24 '14 at 20:38
  • Guys I'm trying to create a simple sync tool, which will send these CRUD events to a sync handler. But apparently on any of CRUD operations, except on update, multiple events are fired, and I want to take only the first most event. Eg., in case of created, created event, then modified event of the dir in which the file is created, then modified of the new file created event. How do I just get the created event and work with that? – bad_keypoints Feb 03 '18 at 11:24
  • @bad_keypoints: I think you would need to keep track of the timestamp of the last event on a per-file basis and write custom logic (based on comparison of timestamps) to decide if an event needs to be handled. I don't think there is a canned solution for this. – unutbu Feb 03 '18 at 12:27
  • It looks to me like an OS-related issue. On Windows 10, if I use `mkdir dir1\dir2`, I see two create events for dir2 folder. weird! – Mike IT Expert Mar 24 '23 at 14:25
0

I Made a fix for watchdog:

import watchdog.events
import watchdog.observers
import time

osb = None

class Handler(watchdog.events.PatternMatchingEventHandler):
    def on_any_event(self, event):
        global osb
        osb = None
        print(f"Watchdog received {event} event - {event.src_path}.")
 
    def on_modified(self, event):
        global osb
        if not osb == event.src_path:
            #Code goes here
        osb = event.src_path
 
 
if __name__ == "__main__":
    src_path = r"C:\\Users\\Administrator\\Desktop\\"
    event_handler = Handler()
    observer = watchdog.observers.Observer()
    observer.schedule(event_handler, path=src_path, recursive=True)
    observer.start()
    try:
        while True:
            time.sleep(0)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()

No I will not explain, sorry(forgot how it works and don't have time to find out).