1
for remote_path in list_of_stfp_paths:
    with pysftp.Connection(HOSTNAME, username=USERNAME, password=PASSWORD) as sftp:
        sftp.get(remote_path, str(local_path))

    #checks distinct count of a column for the csv downloaded, deletes it later
    df = pd.read_csv(str(local_path))
    print(df['taken_time'].value_counts())
    os.remove(str(local_path))

The code I use is above. It's just run in a for loop with multiple remote paths. Sometimes, it gets completed. Sometimes, I get an error saying

Exception: Authentication failed.

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
LearnerJS
  • 298
  • 2
  • 14

1 Answers1

1

Do not reconnect for each file. Loop the downloads only, not the connection:

with pysftp.Connection(HOSTNAME, username=USERNAME, password=PASSWORD) as sftp:
    for remote_path in list_of_stfp_paths:
        sftp.get(remote_path, str(local_path))

        #checks distinct count of a column for the csv downloaded, deletes it later
        df = pd.read_csv(str(local_path))
        print(df['taken_time'].value_counts())
        os.remove(str(local_path))

Though note that you do not even have to download the files to a local disk, just read them straight from the SFTP server:

with pysftp.Connection(HOSTNAME, username=USERNAME, password=PASSWORD) as sftp:
    for remote_path in list_of_stfp_paths:
        with sftp.open(remote_path) as f:
            f.prefetch()
            #checks distinct count of a column for the csv
            df = pd.read_csv(f)
            print(df['taken_time'].value_counts())

It might even be faster as it allows the download and parsing happen in parallel, not in sequence. See Read CSV/Excel files from SFTP file, make some changes in those files using Pandas, and save back

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992