2

I use put_object to copy from s3 bucket to another cross-region, cross-partition. The problem is the file sizes have become more unpredictable and since get_object stores to memory, I end up giving it more resource than it needs most of the time.

Ideally I want to "stream" the download/upload processes

For example, Given i have an object hash of 123abc456def789

Scenario: Download/Upload object in chunks

  1. Download part of the object 123 save to memory
  2. Upload part of the object 123 remove from memory
  3. ... and so on until 789

This way what gets written to buffer is constant space

It was suggested to use copy_object but I transfer between normal to GovCloud so this is not possible. Ideally i want to get away from downloading to disk.

edmamerto
  • 7,605
  • 11
  • 42
  • 66

1 Answers1

2

I had the same problem recently, and the answer from smallo on this question helped me to find a solution! So all credits to him!

But basically, you can use the method read and pass the amt parameter to it specifying the number of bytes you want to read from stream. You can call it multiple times until all the stream is done. It would look something like this:

import boto3
import io

s3 = boto3.session.Session(profile_name=profile).resource('s3')
s3_obj = s3.Object(bucket_name=bucket, key=key)

body = s3_obj.get()['Body']
with io.FileIO('sample.txt', 'w') as file:
    while file.write(body.read(amt=512)):
        pass
Leonardo Lima
  • 373
  • 2
  • 10