0

I have a working version of Python code, it reads files from each subfolder and save them into a list, however, there are quite a lot of files so the code takes a very long time to execute, is there a way to optimize the code below?

ftp_subdir_list = ['example_folder/2021/01/01', 'example_folder/2021/01/02', 'example_folder/2021/01/03',..................................., 'example_folder/2021/08/08', 'example_folder/2021/08/09']

ftp_file_list = []
for dir in ftp_subdir_list: 
    # login into FTP
    ftp = ftplib.FTP_TLS(host)
    ftp.login(username, password)
    ftp.cwd(dir)

    file_list = ftp.nlst()
    ftp_file_list.append(file_list)

print(ftp_file_list)

Because there are quite a lot of folders and each folder has around 20 files, is there a way to optimize the code and increase the execution time? Is the for loop here slowing the speed? Thanks.

wawawa
  • 2,835
  • 6
  • 44
  • 105

1 Answers1

2

Move the login outside the for loop - that is likely to be the one of the culprits. However, like any performance problem measure where the time is taken up. Some things you cannot change like the time for a directory listing others you can.

Andrew
  • 146
  • 5
  • Hmm thanks, the thing is the logic changes if I move the login part, because `ftp.cwd(dir)` need to be called based on each `dir` in the list... – wawawa Aug 09 '21 at 14:50
  • I see - you can cd back to root before, or you can change the path in the input. In fact if you have a full path i think you can just list that. – Andrew Aug 09 '21 at 14:51
  • @MartinPrikryl Hi I don't think ftp will recognize the absolute path... it only works with the full path such as `'example_folder/2021/01/01'`, it'll complain something like `dir does not exist` is I give it an absolute dir such as `01` – wawawa Aug 09 '21 at 21:07
  • `01` is not an absolute path. Absolute path is like `/example_folder/2021/01/01` or `/home/user/example_folder/2021/01/01`. – Martin Prikryl Aug 10 '21 at 05:41