What's the easiest way to retrieve FTP files based on a list of filenames (in multiple FTP directories) - Python

Question

In FTP, the structure looks like this:

main_folder / year / month / day / multiple csv files

For example:

main_folder / 2020 / 02 / 03 / '2020-02-03_01.csv', '2020-02-03_02.csv', '2020-02-03_03.csv', .....

main_folder / 2020 / 03 / 03 / '2020-03-03_01.csv', '2020-03-03_02.csv', '2020-03-03_03.csv', .....
main_folder / 2021 / 01 / 01 / '2021-01-01_01.csv', '2021-01-01_02.csv', '2021-01-01_03.csv', .....

So each year has 12 folders (one for each month), each month contains multiple folders (one for one day), and each day have multiple csv files (filename is consisted of the date_xx.csv).

I have a list of filenames that I want to download, for example:

example_list = ['2021-08-09_01.csv', '2021-08-09_02.csv', '2021-08-10_12.csv',
                '2021-08-10_03.csv']

My current code behaves like this: extract the date year/month/day from the filename -> then construct the corresponding dir in FTP, for example, for file '2021-08-09_01.csv', it will look at all the files under dir main_folder/2021/08/09, but if I use the complete directory to tell FTP to only look at the specific file, it gave me error ftplib.error_perm: 550 No such directory.

This is the code:

file_dir = "main_folder/2021/08/09/2021-08-09_01.csv"

ftp_conn = open_ftp_connection(ftp_host, ftp_username, ftp_password, file_dir)
ftp = ftplib.FTP_TLS(host)
ftp.login(username, password)
ftp.cwd(file_dir)

I'm a bit confused here, how can I tell FTP to look for those files in the corresponding directory and read the data of them (end goal is to publish to s3 bucket)

When changing directory, you have to do it to the dir itself, not the file in the dir. Try `ftp.cwd(main_folder/2021/08/09)` then afterwards, download the file. — Cow, Aug 11 '21 at 08:58
After this, how can I tell FTP to only download the target file instead of downloading all the files under `main_folder/2021/08/09`? — wawawa, Aug 11 '21 at 09:02
Thanks a lot (I've been struggling with FTP for a couple of days...) — wawawa, Aug 11 '21 at 09:05
Just give the actual file name to the `retrbinary` call. See https://stackoverflow.com/q/11573817/850848#39719174 — Martin Prikryl, Aug 11 '21 at 09:16

Cow · Accepted Answer · 2021-08-11T12:03:00.907

2

This is how I would do it:

import ftplib, os

example_list = ['2021-08-09_01.csv', '2021-08-09_02.csv', '2021-08-10_12.csv', '2021-08-10_03.csv']

FTP_IP = "1.2.3.4"
FTP_LOGIN = "username"
FTP_PASSWD = "password"
CURRENT_DIR = os.getcwd()
MAIN_DIR = "/main_folder"

with ftplib.FTP(FTP_IP, FTP_LOGIN, FTP_PASSWD) as ftp:
    for entry in example_list:
        filesplit = entry.split("-")
        directory = "main_folder/"+filesplit[0]+"/"+filesplit[1]+"/"+filesplit[2].split("_")[0]
        ftp.cwd(directory)
        with open(os.path.join(CURRENT_DIR, entry), 'wb') as f:
            ftp.retrbinary(entry, f.write)
        ftp.cwd(MAIN_DIR)

The file will be downloaded to the directory, where you execute the python script from with the same filename as those on the server.

edited Aug 11 '21 at 12:03

answered Aug 11 '21 at 09:18

Cow

2,543
4
13
25

Hi thanks for this, what if I don't have a list of filename, I just want to download every csv file available in FTP, instead of changing FTP dir lots lots of times, is there an easy way to do it? – wawawa Aug 11 '21 at 10:38
Isn't that exactly the opposite of what you have asked for before?: *"how can I tell FTP to only download the target file instead of downloading all the files under"* – Martin Prikryl Aug 11 '21 at 10:48
1

Yeah it's true, I realized that I need both mechanisms for my user case, I finally worked them out, thanks for the help! – wawawa Aug 11 '21 at 11:03

What's the easiest way to retrieve FTP files based on a list of filenames (in multiple FTP directories) - Python

1 Answers1