14

I am using the Python watchdog module on a Windows 2012 server to monitor new files appearing on a shared drive. When watchdog notices the new file it kicks off a database restore process.

However, it seems that watchdog will attempt to restore the file the second it is created and not wait till the file has finished copying to the shared drive. So I changed the event to on_modified but there are two on_modified events, one when the file is initially being copied and one when it is finished being copied.

How can I handle the two on_modified events to only fire when the file being copied to the shared drive has finished?

What happens when multiple files are copied to the shared drive at the same time?

Here is my code

import time
import subprocess
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

class NewFile(FileSystemEventHandler):
    def process(self, event):
        if event.is_directory:
            return

    if event.event_type == 'modified':            
        if getext(event.src_path) == 'gz':
            load_pgdump(event.src_path)

    def on_modified(self, event):
        self.process(event)

def getext(filename):
    "Get the file extension"
    file_ext = filename.split(".",1)[1]
    return file_ext

def load_pgdump(src_path):    
    restore = 'pg_restore command ' + src_path
    subprocess.call(restore, shell=True)

def main():
    event_handler = NewFile()
    observer = Observer()
    observer.schedule(event_handler, path='Y:\\', recursive=True)
    observer.start()

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()

if __name__ == '__main__':
    main()
tjmgis
  • 1,589
  • 4
  • 23
  • 42
  • How are you transferring the file? Can you also upload a checksum? The idea of doing a DB restore without one is scary to me. If you go the "check, delay, check again" route, you're going with optimism. –  Oct 15 '15 at 10:41

10 Answers10

11

In your on_modified event, just wait until the file is finished being copied, via watching the filesize.

Offering a Simpler Loop:

historicalSize = -1
while (historicalSize != os.path.getsize(filename)):
  historicalSize = os.path.getsize(filename)
  time.sleep(1)
print "file copy has now finished"
defermat
  • 633
  • 5
  • 6
Mtl Dev
  • 1,604
  • 20
  • 29
  • 2
    I think it'll work if you watch only one file. If you watch a directory, you can have multiple interleaved on_modified event for different files. The logic would then be a bit more complicated than that. – autra Nov 21 '19 at 13:54
4

I'm using following code to wait until file copied (for Windows only):

from ctypes import windll
import time

def is_file_copy_finished(file_path):
    finished = False

    GENERIC_WRITE         = 1 << 30
    FILE_SHARE_READ       = 0x00000001
    OPEN_EXISTING         = 3
    FILE_ATTRIBUTE_NORMAL = 0x80

    if isinstance(file_path, str):
        file_path_unicode = file_path.decode('utf-8')
    else:
        file_path_unicode = file_path

    h_file = windll.Kernel32.CreateFileW(file_path_unicode, GENERIC_WRITE, FILE_SHARE_READ, None, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, None)

    if h_file != -1:
        windll.Kernel32.CloseHandle(h_file)
        finished = True

    print 'is_file_copy_finished: ' + str(finished)
    return finished

def wait_for_file_copy_finish(file_path):
    while not is_file_copy_finished(file_path):
        time.sleep(0.2)

wait_for_file_copy_finish(r'C:\testfile.txt')

The idea is to try open a file for write with share read mode. It will fail if someone else is writing to it.

Enjoy ;)

Dmytro
  • 1,290
  • 17
  • 21
  • This reliably supports copying between Windows' Remote Desktop Protocol (RDP). RDP creates the file handler, then read the full file contents into memory before writing, which makes the commonly-recommended "check file size every 0.x seconds" approach unreliable for large files. Thanks! – Tyler Dane Nov 18 '20 at 21:12
  • Doc to `windll`'s `CreateFileW` call: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilew – Tyler Dane Nov 18 '20 at 21:13
  • There is a small bug in the above code: if isinstance(file_path, str): should be if _not_ isinstance(file_path, str): – Phlogi Jan 05 '22 at 08:14
3

I would add a comment as this isn't an answer to your question but a different approach... but I don't have enough rep yet. You could try monitoring filesize, if it stops changing you can assume copy has finished:

copying = True
size2 = -1
while copying:
    size = os.path.getsize('name of file being copied')
    if size == size2:
        break
    else:
        size2 = os.path.getsize('name of file being copied')
        time.sleep(2)
iri
  • 735
  • 4
  • 8
2

On linux you also get close event. Than solution would be to wait with processing file until file gets closed. My approach would be to add on_closed handling.

class Handler(FileSystemEventHandler):
    def __init__(self):
        self.files_to_process = set()

    def dispatch(self, event):
        _method_map = {
            'created': self.on_created,
            'closed': self.on_closed
        }

    def on_created(self, event):
        self.files_to_process.add(event.src_path)

    def on_closed(self, event):
        self.files_to_process.remove(event.src_path)
        actual_processing(event.src_path)
ravenwing
  • 668
  • 3
  • 20
1

I had a similar issue recently with watchdog. A rather simple but not very smart workaround was for me to check the change of file size in a while loop using a two-element list, one for 'past', one for 'now'. Once the the values are equal the copying is finished.

Edit: something like this.

past = 0
now = 1
value = [past, now]
while True:
    # change

    # test
    if value[0] == value[1]:
        break
    else:
        value = [value[1], value[0]]
jake77
  • 1,892
  • 2
  • 15
  • 22
1

This works for me. Tested in windows as well with python3.7

while True:
        size_now = os.path.getsize(event.src_path)
        if size_now == size_past:
            log.debug("file has copied completely now size: %s", size_now)
            break
            # TODO: why sleep is not working here ?
        else:
            size_past = os.path.getsize(event.src_path)
            log.debug("file copying size: %s", size_past)
sathish
  • 101
  • 4
1

Old I know, but I recently came up with a solution for this exact problem. In my case, I was only concerned with wav and mp3 files. This function will ensure that only files that are completely copied will be sent to makerCore() because the created placeholder files do not have any extension and will always end up in 'not ready'. Once the file is completed it will trigger the watchdog module again except this time with an extension. This will work on multiple files simultaneously as well.

def on_created(event):
    #print(event)
    if str(event.src_path).endswith('.mp3') or str(event.src_path).endswith('.wav'):
        makerCore(event)
    else:
        print('not ready')
LaytonGB
  • 1,384
  • 1
  • 6
  • 20
CLipp
  • 100
  • 5
0

I am using a different approach that might not be the most elegant one but is easy to do on any plateform if you have control on the side copying the file.

Just had 'in-progress' to the name of the file until the copying is complete, and then rename the file. You can then have a while loop waiting for the file with the name without 'in-progress' to exist and you're good.

Yohan Obadia
  • 2,552
  • 2
  • 24
  • 31
0

I've tried the check filesize - wait - check again routine many have suggested above but it's not very reliable. To make it work better I've added a check if the file is still locked.

    file_done = False
    file_size = -1

    while file_size != os.path.getsize(file_path):
        file_size = os.path.getsize(file_path)
        time.sleep(1)

    while not file_done:
        try:
            os.rename(file_path, file_path)
            file_done = True
        except:
            return True
b3d
  • 1
  • 1
0

Following up to ravenwing's answer, more details can be found about on_closed in watchdog here. As mentioned in the documented issue, there is no documentation available for on_closed yet and it can only be used with unix.

Rohit Chaku
  • 11
  • 1
  • 2