0

Backstory is im trying to pull some data from an ftp login I was given. This data constantly gets updated, about daily, and I believe they wipe the ftp at the end of each week or month. I was thinking about inputting a date and having the script run daily to see if there are any files that match the date, but if the servers time isn't accurate it could cause data loss. For now I just want to download ALL the files, and then ill work on fine-tuning it.

I haven't worked much with coding ftp before, but seems simple enough. However, the problem I'm having is small files get downloaded without a problem and their file sizes check out and match. When it tries to download a big file that would normally take a few minutes, it gets to a certain point (almost completing the file) and then it just stops and the script hangs.

For Example:

It tries to download a file that is 373485927 bytes in size. The script runs and downloads that file up until 373485568 bytes. It ALWAYS stops at this amount after trying different methods and changing some code.

Don't understand why it always stops at this byte and why it would work fine with smaller files (1000 bytes and under).

import os
import sys
import base64
import ftplib

def get_files(ftp, filelist):
    for f in filelist:
        try:
            print "Downloading file " + f + "\n"
            local_file = os.path.join('.', f)
            file = open(local_file, "wb")
            ftp.retrbinary('RETR ' + f, file.write)
        except ftplib.all_errors, e:
            print str(e)

        file.close()
    ftp.quit()

def list_files(ftp):
    print "Getting directory listing...\n"
    ftp.dir()
    filelist = ftp.nlst()
    #determine new files to DL, pass to get_files()
    #for now we will download all each execute
    get_files(ftp, filelist)

def get_conn(host,user,passwd):
    ftp = ftplib.FTP()
    try:
        print "\nConnecting to " + host + "...\n"
        ftp.connect(host, 21)
    except ftplib.all_errors, e:
        print str(e)

    try:
        print "Logging in...\n"
        ftp.login(user, base64.b64decode(passwd))
    except ftplib.all_errors, e:
        print str(e)

    ftp.set_pasv(True)

    list_files(ftp)

def main():
    host = "host.domain.com"
    user = "admin"
    passwd = "base64passwd"

    get_conn(host,user,passwd)

if __name__ == '__main__':
    main()

Output looks like this with file dddd.tar.gz being the big one and never finishes it.

Downloading file aaaa.del.gz

Downloading file bbbb.del.gz

Downloading file cccc.del.gz

Downloading file dddd.tar.gz

DeNi
  • 1
  • 2
  • Sounds like a buffering issue. The size is stops at is a multiple of 4096, which is a very likely value for a buffer. Interestingly, retrbinary's default buffer is 8192 and the size it stops at is **not** a multiple of 8192, so that's not this buffer that is at fault. – spectras Jan 26 '17 at 22:35
  • As a sidenote, `file` is a reserved word in python, you should not use it as a variable name. Rename it to, for instance, `fd` (for "file descriptor"). I doubt it's what is causing the problem here, but you should eliminate that possibility first. – spectras Jan 26 '17 at 22:37
  • By the way, the issue might be on the remote side, for instance the FTP server not flushing properly at the end - what FTP server are you using? And did you test with another one? – spectras Jan 26 '17 at 22:46
  • @spectras I don't run the FTP server. I suppose I could test it with my own. However, using command "mget *" with the basic ftp application from a debian terminal works perfectly fine on that server. I assume what I'm trying to do is like replicating "mget *" behavior, to just grab all files. – DeNi Jan 27 '17 at 22:49
  • If that FTP server is public, can you add it so we might test your code? Otherwise, do you at least know what FTP server it is (name and release)? – spectras Jan 28 '17 at 14:35
  • The reason I ask it python's ftplib has a rather simplistic implementation of RETR. It always wait for the server to close the data connection. The RFC however does not say the server *has* to do it, so some may not. In that case, some data might remain unsent in server-side, kernel-level buffers, waiting for ftplib to take the initiative of closing the connection. If that's what it happening, you'll have no choice but using another ftp-capable library, such as [pycurl](http://pycurl.io/). – spectras Jan 28 '17 at 15:02
  • @spectras nmap is telling me the server is vsftpd 2.2.2 and OS type unix – DeNi Jan 29 '17 at 00:36

2 Answers2

0

This could be caused by a timeout issue, perhaps try in:

def get_conn(host,user,passwd):
    ftp = ftplib.FTP()

add in larger timeouts until you have more of an idea whats going on, like:

def get_conn(host,user,passwd):
    ftp = ftplib.FTP(timeout=100)

I'm not sure if ftplib defaults to a timeout or not, it would be worth checking and worth checking if you are being timed-out from the server. Hope this helps.

Nick H
  • 1,081
  • 8
  • 13
  • Thanks, I'll try that. Documentation says it uses the global default timeout. I assume this.. >>> import socket >>> print socket.getdefaulttimeout() >>> None – DeNi Jan 26 '17 at 18:40
  • Setting a high timeout didn't work. I download just stops at the same byte again. I never get disconnected from the server so I don't think it was ever a timeout issue. – DeNi Jan 26 '17 at 18:58
  • It is most certainly **not** a timeout issue. The chance it would break at a perfect multiple of 4096, the most common page size, is really low. – spectras Jan 28 '17 at 14:34
0

If you are running your scrpit in windows cmd console, try to disable the "QuickEdit Mode" option of cmd.

I had encontered a problem that my ftp script hangs running in windows, but works normally in linux. At last i found that solution is working for me.

Ref:enter link description here

funway
  • 113
  • 1
  • 7