1

I am working in an etl (first time), and I need to extract some files from the client's SFTP. The problem I have is that the files number is variable, so I need to check if the file exist and get it, the file format is like "file_YYYY-MM-DD-number-n" where YYYY-MM-DD is the current date and n is the number of the file, so if there are 7 files I have to look for:

  • file_2019-08-25-number-1
  • file_2019-08-25-number-2

Until now I have found that I can do something like this

cnopts = pysftp.CnOpts()
with pysftp.Connection(host=host, port=port, username=username, password=password, cnopts=cnopts) as sftp:
    files = sftp.listdir(directory)

How do I find in the files there?

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
Carlos Salazar
  • 1,818
  • 5
  • 25
  • 48

2 Answers2

9

To check for an existence of a file with pysftp, use Connection.exists method:

with pysftp.Connection(...) as sftp:
    if sftp.exists("file_2019-08-25-number-1"):
        print("1 exists")
    if sftp.exists("file_2019-08-25-number-2"):
        print("2 exists")

Though you better do not use pysftp in the first place, as it is a dead project. Use Paramiko instead (see pysftp vs. Paramiko).

To check file existence with Paramiko, use SFTPClient.stat. See How to check if the created file using Paramiko exec_command exists.


Obligatory warning: Do not set cnopts.hostkeys = None, unless you do not care about security. For the correct solution see Verify host key with pysftp.

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
1

You can use Python's built-in re regular expression module to determine if a filename matches the general pattern you're looking as the example below does.

import re


files = [
    'file_2019-08-25-number-1',
    'foo.bar',
    'file_2019-08-25-number-2',
    'file_2018-02-28-number-42',
    'some_other_file.txt'
]

pattern = re.compile(r'file_\d{4}-\d{2}-\d{2}-number-\d+')

for filename in files:
    if pattern.match(filename):
        print(f'{filename!r} matches pattern')

Output:

'file_2019-08-25-number-11' matches pattern
'file_2019-08-25-number-2' matches pattern
'file_2018-02-28-number-42' matches pattern

If all you want to do is check for a specific filename, you could do something like this:

if filename.startswith('file_2019-08-25-number-'):
    # Do something with filename.
    ...
martineau
  • 119,623
  • 25
  • 170
  • 301