1

I'm planning on using ftplib to monitor a server for new files. Is there a way to see if a new file is still being transferred?

Here is a partial solution that finds new files:

Monitor remote FTP directory

from ftplib import FTP
from time import sleep

ftp = FTP('localhost')
ftp.login()

def changemon(dir='./'):
    ls_prev = set()

    while True:
        ls = set(ftp.nlst(dir))

        add, rem = ls-ls_prev, ls_prev-ls
        if add or rem: yield add, rem

        ls_prev = ls
        sleep(5)

for add, rem in changemon():
    print('\n'.join('+ %s' % i for i in add))
    print('\n'.join('- %s' % i for i in remove))

ftp.quit()
Community
  • 1
  • 1
gnarbarian
  • 2,622
  • 2
  • 19
  • 25

1 Answers1

1

I'm assuming that you want to know if you can determine if a file is in the process of being transferred by a different FTP connection. In general, this is not possible because there is no FTP command to ask if there is a file transfer in progress.

You could rely on a heuristic of polling the non-standard SIZE command via FTP.size to see if the size of a file is growing over time and then assume that if it stays the same size for some duration that the file is not in progress.

Note that this heuristic could potentially cause you to detect that a file is not in progress even though it still was in progress, thus you'll want to make sure that you're ok with potentially processing partially truncated files.

Keep in mind that FTP connections are sometimes disconnected and resumed at a later point in time, thus if this happens, you may have to rely on a very large timescale to detect when a file is complete unless you know for sure how big the file is expected to be.

If you have control over your clients, you could require your clients to place some other metafile next to your file uploads that tells you the expected sizes of the files you are monitoring, then you would know for certain when a file is done uploading by simply checking its file size. Similarly, you could use an MD5 or other external file validity check. Another approach could rely on the file itself being self describing and including an internal integrity check or file length. Many standard file formats include such a header.

b4hand
  • 9,550
  • 4
  • 44
  • 49
  • You are correct in your assumption. I wonder if I could try to rename the file as test. – gnarbarian Feb 11 '16 at 01:50
  • 1
    Using rename as a check probably won't work on Unix FTP servers. Unlike Windows, Unix does not lock open files and renaming an open file won't interrupt the file transfer either. – b4hand Feb 11 '16 at 01:53
  • well it is a windows ftp server at the moment but in the future we want to move to a different platform. Would trying to move the file mess up the transfer or would it prevent me from doing so? – gnarbarian Feb 11 '16 at 01:55
  • 1
    Probably neither, but it'll depend on the actual FTP server implementation. On Unix, typically open files are associated with file descriptors, *not names* and those file descriptors are pointers to an inode, so renaming a file only affects the containing directory, not the file itself. Thus you can rename a file which is being modified through calls to write, and those writes will simply appear in the final renamed file. – b4hand Feb 11 '16 at 02:01
  • 1
    On windows, trying to rename a file that was open may give you a permission denied or similar error. – b4hand Feb 11 '16 at 02:05
  • You're right about the windows behavior. I already have a server side script to do this and it operates by repetedly trying to move the file until it succeeds. The windows error looks like this: [Error 32] The process cannot access the file because it is being used by another process if possible through the ftp protocol it would be more portable for us because I'd be able to monitor a FTP server that we do not administer. I'll just continue to use the server side script and maybe use something else for the remote. – gnarbarian Feb 11 '16 at 02:11
  • 1
    FWIW, I strongly recommend relying on some external form of synchronization like an external checksum or file length. This will be portable and give you more robust behavior if you're trying to build a form of drop box system for files. – b4hand Feb 11 '16 at 02:14