28

I wrote a python script to process very large files (few TB in total), which I'll run on an EC2 instance. Afterwards, I want to store the processed files in an S3 bucket. Currently, my script first saves the data to disk and then uploads it to S3. Unfortunately, this will be quite costly given the extra time spent waiting for the instance to first write to disk and then upload.

Is there any way to use boto3 to write files directly to an S3 bucket?

Edit: to clarify my question, I'm asking if I have an object in memory, writing that object directly to S3 without first saving the object onto disk.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Richard Sun
  • 385
  • 1
  • 3
  • 6

2 Answers2

27

You can use put_object for this. Just pass in your file object as body.

For example:

import boto3

client = boto3.client('s3')
response = client.put_object( 
    Bucket='your-s3-bucket-name',
    Body='bytes or seekable file-like object',
    Key='Object key for which the PUT operation was initiated'
)
Nic
  • 12,220
  • 20
  • 77
  • 105
  • 4
    This is exactly what I needed. For anyone wondering, my script will now first use pickle.dumps to create a bytes representation of the object. Then, I used put_object as described above to write directly to S3. To retrieve the object later, use get_object to get from S3 and pickle.loads to unpickle it. – Richard Sun Jan 29 '18 at 04:58
  • 1
    What is the key? – Jwan622 Sep 16 '19 at 19:24
  • 1
    @Jwan622 that'd be your file name – Nic Sep 17 '19 at 02:22
  • Is this possible while still using the high level apis? ie boto3.transfer? – Famous Jameis Apr 20 '21 at 06:59
0

It's working with the S3 put_object method:

key = 'filename'
response = s3.put_object(Bucket='Bucket_Name',
                         Body=json_data,
                         Key=key)
Nic
  • 12,220
  • 20
  • 77
  • 105
Abdul Rehman
  • 5,326
  • 9
  • 77
  • 150