0

I am currently using os.walk to navigate through all subfolders and files in a massive Network drive directory, However, Whenever my VPN disconnects, The for loop fails. Next day when I re-run my code, I would like to resume from the last file that was processed. What modifications should I make in my code below?

import os

directory = '//DirectoryName/FolderName'

for root, dirs, files in os.walk((os.path.normpath(directory)), topdown=False):
  for name in files:
        Source_File = os.path.join(root,name)
        #This loads the file to s3 bucket
        s3_client.upload_file(Source_File, bucket, Target_File)

The directory is really massive, Has hundreds of sub-folders, and thousands of files in total.

  • Keep track of the files you already processed in a separate file – rdas Sep 16 '22 at 17:56
  • Are you sure, what you do is legal? – treuss Sep 16 '22 at 17:57
  • @treuss, What do you mean? I am doing this work as a part of my job. – Devansh Popat Sep 16 '22 at 18:40
  • @rdas, That is a good point. But how do I resume from where I left off the previous day? – Devansh Popat Sep 16 '22 at 18:40
  • You read the file at the start of the script loading all the file names into a set or something similar. Then when walking the directory tree, you can skip any files which are already in the set. – rdas Sep 16 '22 at 18:44
  • [python - Continue from given folder when walking recursively through folder - Stack Overflow](https://stackoverflow.com/questions/73723605/continue-from-given-folder-when-walking-recursively-through-folder/73725393#73725393) – furas Sep 16 '22 at 23:07
  • maybe you should use `rsync` and it should check if you have newer file and send only newer - so it will skip files which you alread send. It will also resend last file if it is was send only partially. [backup - Rsync to AWS S3 bucket - Server Fault](https://serverfault.com/questions/754690/rsync-to-aws-s3-bucket) – furas Sep 16 '22 at 23:10

0 Answers0