0

I am using AWS S3 as my default file storage system. I have a model with a file field like so:

class Segmentation(models.Model):
    file = models.FileField(...)

I am running image processing jobs on a second server that dump processsed-images to a different AWS S3 bucket. I want to save the processed-image in my Segmentation table.

Currently I am using boto3 to manually download the file to my "local" server (where my django-app lives) and then upload it to the local S3 bucket like so:

from django.core.files import File
import boto3

def save_file(segmentation, foreign_s3_key):
 
    # set foreign bucket
    foreign_bucket = 'foreign-bucket'

    # create a temp file:
    temp_local_file = 'tmp/temp.file'

    # use boto3 to download foreign file locally:
    s3_client = boto3.client('s3')
    s3_client.download_file(foreign_bucket , foreign_s3_key, temp_local_file)
            
    # save file to segmentation:
    segmentation.file = File(open(temp_local_file, 'rb'))
    segmentation.save()
            
    # delete temp file:
    os.remove(temp_local_file)

This works fine but it is resource intensive. I have some jobs that need to process hundreds of images.

Is there a way to copy a file from the foreign bucket to my local bucket and set the segmentation.file field to the copied file?

Daniel
  • 3,228
  • 1
  • 7
  • 23
  • Does this answer your question? [Retrieve S3 file as Object instead of downloading to absolute system path](https://stackoverflow.com/questions/37087203/retrieve-s3-file-as-object-instead-of-downloading-to-absolute-system-path) – ranka47 Mar 11 '21 at 18:34
  • Not sure - can you please provide an example of how to implement this - wouldn't thise use just as much resources? – Daniel Mar 11 '21 at 19:04
  • in this case you will save the time to save it in the disk and then reloading it back as the fileobject itself will be get streamed to you. (How the retrieval will happen that needs to be looked upon though). The saving and retrieving it back gets saved by using fileobject. – ranka47 Mar 11 '21 at 20:17
  • Does this answer your question? https://stackoverflow.com/questions/44043036/how-to-read-image-file-from-s3-bucket-directly-into-memory – ranka47 Mar 11 '21 at 20:21

1 Answers1

0

I am assuming you want to move some files from one source bucket to some destination bucket, as the OP header suggests, and do some processing in between.

import boto3 
my_west_session = boto3.Session(region_name = 'us-west-2')
my_east_session = boto3.Session(region_name = 'us-east-1')
backup_s3 = my_west_session.resource("s3")
video_s3 = my_east_session.resource("s3")
local_bucket = backup_s3.Bucket('localbucket') 
foreign_bucket = video_s3.Bucket('foreignbucket')

for obj in foreign_bucket.objects.all():
    # do some processing
    # on objects
    copy_source = {
        'Bucket': foreign_bucket,
        'Key': obj.key
        }
    local_bucket.copy(copy_source, obj.key)

Session configurations

S3 Resource Copy Or CopyObject depending on your requirement.

samtoddler
  • 8,463
  • 2
  • 26
  • 21
  • Thanks - I saw the copy function in another solution - now I am trying to assign the file field in my django `segmentation` instance to the copied file - any ideas there? – Daniel Mar 11 '21 at 19:14
  • @Daniel I did not get you, are you trying to access the attributes of the s3 object and assign them to your custom class fields? – samtoddler Mar 11 '21 at 19:22
  • Yes - but its not a custom class - its a Django model - and the field: `file = models.FileField()` is itself a Django class. – Daniel Mar 11 '21 at 19:23
  • @Daniel actually in the code you shared, you are not accessing object's attributes you are opening the file and creating the django `File` object using that. As per [this doc](https://docs.djangoproject.com/en/3.1/topics/files/) you can do. To be frank I am not actually sure how your project works and I am not too good with `django` as well. – samtoddler Mar 11 '21 at 19:44