I am writing a script that will download a large number of files from s3. I know I can write all the files to a directory and then use the files from that point. I really want the text contained within these s3 files but considering that there is a lot I would like to avoid downloading them all. In the past, I have downloaded them to a junk directory and then deleted the directory but I want a more elegant solution. Below is my code for downloading info from s3
s3 = boto3.resource('s3', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY)
# select bucket
my_bucket = s3.Bucket('<bucket_name>')
for item in my_bucket.objects.all():
my_bucket.download_file(item.key, f'../<junk_directory>/{filename}')
I would like to ideally use the tempfile
package as it seems to fit my needs nicely. Is there a way to pass a tempfile to the filename
parameter in download_file
?
This stack question talks about manipulating temporary files and I cannot think how to modify the solution to work with s3. How would I temporarily download s3 content for storage and performance gains?
Thank you in advance!