1

I have been trying to write a function with Python that would allow to download the most recently added file (by timestamp within filename).

enter image description here

You can see that the format has a big timestamp.

What I have so far with the help of forums is the following code. In the following code, I tried to sort using the date field (real added date to FTP server). However, I want to adjust this code so that I can sort the files by the timestamp within filename.

EDIT (Tried to clean the code a bit):

def DownloadFileFromFTPServer2 (server, username, password, directory_to_file, file_to_write):
    try:
        f = ftplib.FTP(server)
    except ((socket.error, socket.gaierror), e):
        print ('cannot reach to %s' % server)
        return
    print ("Connected to FTP server")

    try:
        f.login(username, password)
    except ftplib.error_perm:
        print ("cannot login anonymously")
        f.quit()
        return
    print ("Logged on to the FTP server")
    try:
        f.cwd(directory_to_file)
        print ("Directory has been set")
    except Exception as inst:
        print (inst)

    data = []
    f.dir(data.append)
    datelist = []
    filelist =[]

    for line in data:
        print (line)
        col = line.split()
        datestr = ' '.join(line.split()[5:8])
        date = time.strptime (datestr, '%b %d %H:%M')
        datelist.append(date)
        filelist.append(col[8])

    combo = zip (datelist, filelist)
    who = dict ( combo )

    # Sort by dates and get the latest file by date....
    for key in sorted(iter(who.keys()), reverse = True):  
        filename = who[key]

        print ("File to download is %s" % filename)
        try:
            f.retrbinary('RETR %s' % filename, open(filename, 'wb').write)
        except (ftplib.err_perm):
            print ("Error: cannot read file %s" % filename)
            os.unlink(filename)
        else:
            print ("***Downloaded*** %s " % filename)
            print ("Retrieving FTP server data ......... DONE")

        #VERY IMPORTANT RETURN
        return


    f.quit()

    return 1

Any help is greately appreciated. Thanks.

EDIT [SOLVED]:

The line

        date = time.strptime (datestr, '%b %d %H:%M')

should be replaced with:

        try:
            date = datetime.datetime.strptime (str(col[8]), 'S01375T-%Y-%m-%d-%H-%M-%S.csv')
        except Exception as inst:
            continue     

try-continue is important since the first two path lines such as '.' and '..' will result a ValuError.

mozcelikors
  • 2,582
  • 8
  • 43
  • 77
  • we don't need all that FTP code. A [mcve] would include the input list and the expected result (output list / item). The rest is just pollution. – Jean-François Fabre Jan 14 '17 at 10:44
  • I agree. Sorry for the mess. Will try to clean it – mozcelikors Jan 14 '17 at 10:48
  • 2
    if all files start with `S01375T-` then get full names and sort them - they should sort as you expect. If they start with different text but the sam length then use slicing `"S01375T-2016-12-01-10-59-03.csv"[8:-4]` -> `"2016-12-01-10-59-03"` and sort this strings, they should sort as you expect. – furas Jan 14 '17 at 11:08
  • 1
    if names start with text with different length but first `-` is always before year then use `split('-',1)` - `"S01375T-2016-12-01-10-59-03.csv".split('-', 1)[1]` -> `'2016-12-01-10-59-03.csv'` and sort this strings – furas Jan 14 '17 at 11:15
  • 1
    Possible duplicate of [python ftp get the most recent file by date](https://stackoverflow.com/questions/8990598/python-ftp-get-the-most-recent-file-by-date) – Cees Timmerman Aug 08 '17 at 08:24

3 Answers3

1

Once you have the list of filenames you can simply sort on filename, since the naming convention is S01375T-YYYY-MM-DD-hh-mm.csv this will naturally sort into date/time order. Note that if the S01375T- part varies you could sort on the name split at a fixed position or at the first -.

If this was not the case you could use the datetime.datetime.strptime method to parse the filenames into datetime instances.

Of course if you wished to really simplify things you could use the PyFileSystem FTPFS and it's various methods to allow you to treat the FTP system as if is was a slow local file system.

Steve Barnes
  • 27,618
  • 6
  • 63
  • 73
  • Hello, The filename is col[8] in my case. Applying date = time.strptime (str(col[8]), 'S01375T-%Y-%m-%d-%H-%M-%S.csv') results into the error : ValueError: time data '.' does not match format 'S01375T-%Y-%m-%d-%H-%M-%S.csv' – mozcelikors Jan 14 '17 at 11:04
  • You need datetime.datetime.strptime. If that doesn't work try str(col[8])[8:-4], '%04Y-%02m-%02d-%02H-%2M-%02S')) – Steve Barnes Jan 14 '17 at 11:11
  • It's resolved. Thanks Mr.Barnes, Anyone having the same problem should check my edit. – mozcelikors Jan 14 '17 at 11:17
1

Try with the -t option in ftp.dir, this orders by date in reverse, then take the first in the list:

data = []
ftp.dir('-t',data.append)
filename = data[0]
hansaplast
  • 11,007
  • 2
  • 61
  • 75
1

You need to extract the timestamp from the filename properly. You could split the filename at the first '-' and remove the file extensition '.csv' (f.split('-', 1)[1][:-4]).
Then you just need to construct the datetime obj for sorting.

from datetime import datetime

def sortByTimeStampInFile(fList):
    fileDict = {datetime.strptime(f.split('-', 1)[1][:-4], '%Y-%m-%d-%H-%M-%S'): f for f in fList if f.endswith('.csv')}
    return [fileDict[k] for k in sorted(fileDict.keys())]


files = ['S01375T-2016-03-01-12-00-00.csv', 'S01375T-2016-01-01-13-00-00.csv', 'S01375T-2016-04-01-13-01-00.csv']
print(sortByTimeStampInFile(files))

Returns:

['S01375T-2016-01-01-13-00-00.csv', 'S01375T-2016-03-01-12-00-00.csv', 'S01375T-2016-04-01-13-01-00.csv']

Btw. as long as your time format is 'year-month-day-hour-min-sec', a simple string sort would do it:

sorted([f.split('-', 1)[1][:-4] for f in fList if f.endswith('.csv')])
>>> ['2016-01-01-13-00-00', '2016-03-01-12-00-00', '2016-04-01-13-01-00']
Maurice Meyer
  • 17,279
  • 4
  • 30
  • 47