5

I'm new to Python and I'm trying to implement a good "file creation" detection. If I do not put a time.sleep(x) my files are elaborated in a wrong way since they are still being "created" in the folder. (buffer is not empty) How can I circumvent this thing without waiting x seconds every time a file is created?

This is my code:

Main:

while 1:
    if len(parser()) > 0: #  arguments are valid
        if len(parser()) == 3:
            log_path = parser()['log_path']
        else:
            log_path = os.getcwd()
        paths = parser()
        if paths:
            handler = Event_Handler()
            observer = Observer()
            observer.schedule(handler, paths['src_fld'], True)
            observer.start()
            try:
                    while True:
                        time.sleep(1)
            except KeyboardInterrupt:
                observer.stop()
            observer.join()
    else:
        exit(1)

Event_Handler class:

class Event_Handler(FileSystemEventHandler):
    def on_created(self, event):
        if not event.is_directory:
            time.sleep(1)

As I said, without that time.sleep(1) if I try to process a big file I'll fail since it's still not completely written.

peperunas
  • 428
  • 1
  • 6
  • 17

3 Answers3

7

For the sake of any future readers who stumble upon this question, as I have, the answer appears to be that you cannot. Watchdog does not and will not support any feature to tell if a file is "complete" as Windows doesn't allow for it and Watchdog is meant to be system-agnostic.

If you're on Linux or some distro of it, inotify is probably a safe bet. Otherwise, on Windows, the best solutions I've found are:

Upload a big file, bigfile, and then another file, bigfile-complete. When you find a file name-complete, you go back and upload/transfer/react to the original file name. In this case, your files would all be added to the monitored directory in a queue going file, file-complete, file2, file2-complete, . . .

Poll on the size of the file until it has remained fixed for a suitable length of time. When it hasn't changed in long enough that you can be reasonably certain it is finished, react to it as normal.

Similarly, when a file is being uploaded to your directory in bits and pieces, it will generate a constant stream of file-modified Watchdog events. You can poll these instead of file size, waiting until they've stopped for a reasonable length of time, and then assume the file is complete and proceed.

None of these solutions are perfect, but this seems to be an inherent issue to Watchdog on Windows. Unfortunately the "perfect" solution seems to be "swap to Linux and use inotify".

Jesse
  • 81
  • 2
  • 5
2

Try reading the file in a while loop:

def on_created(event):

    ...

    # WAITING FOR FILE TRANSFER
    file = None
    while file is None:
        try:
            file = open(event.src_path)
        except OSError:
            file = None
            print("WAITING FOR FILE TRANSFER....")
            time.sleep(3)
            continue
Aimery
  • 1,559
  • 1
  • 19
  • 24
  • for future reference: I tried this code on ubuntu with Python 3.9.9 and Watchdog 2.0.2 and the open() method doesn't raise an exception when trying to open the file even if it's still copying (I used cp on a 2GB file and saw in my logs it was still copying when this code was running). – Oren_C Dec 13 '21 at 10:04
0

Instead of using elapsed time as an indicator, the cleanest solution would be to monitor only certain types of files, using the patterns variable of a PatternMatchingEventHandler.

Simply append '.temp' to every file you are uploading/writing, and rename them to their real name when they're finished.

Set the patterns to look for '*.temp' files, and monitor their renaming to whatever type of file you desire using the FileSystemMovedEvent event (and its associated Handler.on_moved() method) and its dest_path value, which will include the new name of the file, now completely written.