0

I am writing a script that will download a large number of files from s3. I know I can write all the files to a directory and then use the files from that point. I really want the text contained within these s3 files but considering that there is a lot I would like to avoid downloading them all. In the past, I have downloaded them to a junk directory and then deleted the directory but I want a more elegant solution. Below is my code for downloading info from s3

s3 = boto3.resource('s3', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY)
# select bucket
my_bucket = s3.Bucket('<bucket_name>')
for item in my_bucket.objects.all():
    my_bucket.download_file(item.key, f'../<junk_directory>/{filename}')

I would like to ideally use the tempfile package as it seems to fit my needs nicely. Is there a way to pass a tempfile to the filename parameter in download_file?

This stack question talks about manipulating temporary files and I cannot think how to modify the solution to work with s3. How would I temporarily download s3 content for storage and performance gains?

Thank you in advance!

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Josh Zwiebel
  • 883
  • 1
  • 9
  • 30
  • Does this answer your question? https://stackoverflow.com/questions/44043036/how-to-read-image-file-from-s3-bucket-directly-into-memory. It is for an image but can be used for any type of content – Equinox Oct 08 '20 at 18:15
  • Ooh. This looks really useful. I will try this and update the question if that was a correct answer! – Josh Zwiebel Oct 08 '20 at 18:17
  • Does it make sense to store a pdf as StringIO or BytesIO? – Josh Zwiebel Oct 08 '20 at 18:27
  • well technically everything is bytes and if it suits your use case then should be good but i am kind of curious what do you want to do the pdf next – Equinox Oct 08 '20 at 18:30
  • download the pdf, convert it to a txt file and then process the txt file – Josh Zwiebel Oct 08 '20 at 18:37
  • Then i guess it depends on your choice of library too, because most would take a file object see the [second answer](https://stackoverflow.com/a/44045097/5660284) – Equinox Oct 08 '20 at 18:44
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/222756/discussion-between-josh-zwiebel-and-venky). – Josh Zwiebel Oct 09 '20 at 03:10
  • See also [Amazon boto3 download file from S3 to tempfile](https://stackoverflow.com/q/58441526/1048572) – Bergi Dec 07 '21 at 11:29

0 Answers0