I am trying to download 12,000 files from s3 bucket using jupyter notebook, which is estimating to complete download in 21 hours. This is because each file is downloaded one at a time. Can we do multiple downloads parallel to each other so I can speed up the process?
Currently, I am using the following code to download all files
### Get unique full-resolution image basenames
images = df['full_resolution_image_basename'].unique()
print(f'No. of unique full-resolution images: {len(images)}')
### Create a folder for full-resolution images
images_dir = './images/'
os.makedirs(images_dir, exist_ok=True)
### Download images
images_str = "','".join(images)
limiting_clause = f"CONTAINS(ARRAY['{images_str}'],
full_resolution_image_basename)"
_ = download_full_resolution_images(images_dir,
limiting_clause=limiting_clause)