3
import os
import datetime
from datetime import datetime
from dateutil.relativedelta import relativedelta
from dateutil import parser
import pysftp

lt_all = []

# disable hostkey checking
cnopts = pysftp.CnOpts()
cnopts.hostkeys = None

lt_all = []

srv = pysftp.Connection('sftp.com', username = 'username', password = "password", cnopts = cnopts)
srv.chdir('download')
server_file_list = srv.listdir()

for lt_file in server_file_list:
    if srv.isfile(lt_file) and ('invoices' in lt_file.lower() and 'daily' in lt_file.lower() and lt_file.endswith('.csv')):
        try: 
            srv.get(lt_file,os.path.join(os.path.join(data_folder_path,'Invoices'),lt_file),preserve_mtime=True)
        except:
            print("No Invoices Today")

The good news: I have been successfully downloading all CSV files from the SFTP location.

The bad news: all CSV files are being downloaded. Downloading 300+ files everyday is sub-optimal because downloading files that have already been downloaded is redundant.

These CSV files are generated daily. These files follow the same naming convention everyday: invoices_daily_20200204.csv. Notice the date comes at the very end in yyyymmdd format. Edit: The format is actually mmddyy.

How can I limit my downloads to only files created in the last 14 days? Is pysftp the best module for this?

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992

2 Answers2

1

With your originally claimed fixed sortable timestamp format yyyymmdd, it would be easy. If you know that you will always have 14 files to download, use the solution by @lllrnr101. If this is not certain, generate a threshold file name with 14 days old timestamp and compare that against the file names in the listing:

from datetime import datetime, timedelta

d14ago = datetime.now() - timedelta(14)
ts = datetime.strftime(d14ago, '%Y%m%d')
threshold = f"invoices_daily_{ts}.csv"

for lt_file in server_file_list:
    if srv.isfile(lt_file) and (lt_file >= threshold):
        # Download

But it turned out that your timestamp format is mmddyy (%m%d%y), what is not lexicographically sortable. That complicates the solution. One thing you can do is to reorder the timestamp to make it lexicographically sortable:

ts = datetime.strftime(d14ago, '%m%d%y')

for lt_file in server_file_list:
    if srv.isfile(lt_file) and lt_file.startswith("invoices_daily_"):
        file_ts = lt_file[19:21] + lt_file[15:17] + lt_file[17:19]
        if file_ts >= ts:
            # Download

Two side notes:

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
0

Since you already have yyyymmdd naming format, why not just sort your server_file_list based on that and take the server_file_list[-14:] slice? A simple server_file_list.sort() will work since all names are same.

If you were not following the naming convention, you could use stat() method provided by pysftp to access the creation time and sort your entire server_file_list based on that. Then take the server_file_list[-14:] slice.

lllrnr101
  • 2,288
  • 2
  • 4
  • 15