1

In FTP, the structure looks like this:

main_folder / year / month / day / multiple csv files

For example:

main_folder / 2020 / 02 / 03 / '2020-02-03_01.csv', '2020-02-03_02.csv', '2020-02-03_03.csv', .....

main_folder / 2020 / 03 / 03 / '2020-03-03_01.csv', '2020-03-03_02.csv', '2020-03-03_03.csv', .....
main_folder / 2021 / 01 / 01 / '2021-01-01_01.csv', '2021-01-01_02.csv', '2021-01-01_03.csv', .....

So each year has 12 folders (one for each month), each month contains multiple folders (one for one day), and each day have multiple csv files (filename is consisted of the date_xx.csv).

I have a list of filenames that I want to download, for example:

example_list = ['2021-08-09_01.csv', '2021-08-09_02.csv', '2021-08-10_12.csv',
                '2021-08-10_03.csv']

My current code behaves like this: extract the date year/month/day from the filename -> then construct the corresponding dir in FTP, for example, for file '2021-08-09_01.csv', it will look at all the files under dir main_folder/2021/08/09, but if I use the complete directory to tell FTP to only look at the specific file, it gave me error ftplib.error_perm: 550 No such directory.

This is the code:

file_dir = "main_folder/2021/08/09/2021-08-09_01.csv"

ftp_conn = open_ftp_connection(ftp_host, ftp_username, ftp_password, file_dir)
ftp = ftplib.FTP_TLS(host)
ftp.login(username, password)
ftp.cwd(file_dir)

I'm a bit confused here, how can I tell FTP to look for those files in the corresponding directory and read the data of them (end goal is to publish to s3 bucket)

wawawa
  • 2,835
  • 6
  • 44
  • 105
  • When changing directory, you have to do it to the dir itself, not the file in the dir. Try `ftp.cwd(main_folder/2021/08/09)` then afterwards, download the file. – Cow Aug 11 '21 at 08:58
  • After this, how can I tell FTP to only download the target file instead of downloading all the files under `main_folder/2021/08/09`? – wawawa Aug 11 '21 at 09:02
  • 1
    Give me a few minutes I will make an example for you. – Cow Aug 11 '21 at 09:04
  • Thanks a lot (I've been struggling with FTP for a couple of days...) – wawawa Aug 11 '21 at 09:05
  • Just give the actual file name to the `retrbinary` call. See https://stackoverflow.com/q/11573817/850848#39719174 – Martin Prikryl Aug 11 '21 at 09:16

1 Answers1

2

This is how I would do it:

import ftplib, os

example_list = ['2021-08-09_01.csv', '2021-08-09_02.csv', '2021-08-10_12.csv', '2021-08-10_03.csv']

FTP_IP = "1.2.3.4"
FTP_LOGIN = "username"
FTP_PASSWD = "password"
CURRENT_DIR = os.getcwd()
MAIN_DIR = "/main_folder"

with ftplib.FTP(FTP_IP, FTP_LOGIN, FTP_PASSWD) as ftp:
    for entry in example_list:
        filesplit = entry.split("-")
        directory = "main_folder/"+filesplit[0]+"/"+filesplit[1]+"/"+filesplit[2].split("_")[0]
        ftp.cwd(directory)
        with open(os.path.join(CURRENT_DIR, entry), 'wb') as f:
            ftp.retrbinary(entry, f.write)
        ftp.cwd(MAIN_DIR)

The file will be downloaded to the directory, where you execute the python script from with the same filename as those on the server.

Cow
  • 2,543
  • 4
  • 13
  • 25
  • Hi thanks for this, what if I don't have a list of filename, I just want to download every csv file available in FTP, instead of changing FTP dir lots lots of times, is there an easy way to do it? – wawawa Aug 11 '21 at 10:38
  • Isn't that exactly the opposite of what you have asked for before?: *"how can I tell FTP to only download the target file instead of downloading all the files under"* – Martin Prikryl Aug 11 '21 at 10:48
  • 1
    Yeah it's true, I realized that I need both mechanisms for my user case, I finally worked them out, thanks for the help! – wawawa Aug 11 '21 at 11:03