Uploading to an AWS S3 bucket using a Job Queues

Question

I'm developing an application that will run in a elastic environment on AWS (Ec2 instances with autoscaling). All the app is being developed in PHP.

The core of the app is based on safely storing files in a S3 bucket. As the user doesn't needs to know where was it saved, I thought that I could make this store the file temporarily in the EC2 instance and then asynchronously move it to S3, using a job queue (Amazon SQS) to avoid duplicating the wait time and having better support for s3 problems (they aren't common, but can happen).

My questions are:

Does this approach sounds good or I'm missing something?
When processing the job from the queue, the worker instance will have to connect to the original s3 instance, retrieve the file from it and then upload it to s3?
How can avoid having problems when the autoscaling? An instance could be deleted before I store the file in the S3 bucket.

How large are the files expected to be? The bandwidth between ec2 instance and s3 will be much higher than most of your users and s3. Normally I transfer the files directly upon upload. I use jobs for any post processing that needs to happen on the file. — datasage, Apr 16 '14 at 05:09
They shouldn't be greater than 15mb. But making this using the SQS queue allows me to handle a s3 downtime or problem — Nazareno Lorenzo, Apr 16 '14 at 05:29
By adding SQS now you have to worry about SQS failing, or the job not getting picked up, or the instance being terminated before the job can transfer the file. 15MB is not large at all. I would simply consider the upload failed or incomplete if it cant be transferred to s3 on upload. Your upload time to s3 for a file of that size will only be a couple of seconds. — datasage, Apr 16 '14 at 05:34

score 3 · Accepted Answer · edited May 23 '17 at 10:27

Ideally, you don't want your main app server being tied during file uploads (both to the app server and subsequently to S3).

CORS (Cross Origin Resource Sharing) exists to avoid precisely this. You can upload the file to S3 directly from the client-side and let amazon worry about handling multiple uploads from your concurrent users. It lets your app do what it does best without having to worry about the uploads themselves.

This SO question discusses the same issue and there are several customisable plugins like fine uploader out there which can wrap around this with progress bars, etc.

This completely removes the need to make use of any kind of queue. If you need to do certain bookkeeping operations after the upload, you could simply make an ajax call to your server after the upload is complete with the file info, etc. It should also address any concerns you might have with instances being removed due to autoscaling since everything is client side.

Uploading to an AWS S3 bucket using a Job Queues

1 Answers1