14

I'm building an application that allows for large amounts of photo uploads at once, and wanted to know what the best setup would be to tackle this.

This is what I am using so far:

  • Jquery File Upload: allows users to drag and drop images
  • CarrierWave: Processes images and resizes them with ImageMagick
  • Amazon S3: CarrierWave uploads images to Amazon S3 through Fog
  • Heroku: for hosting

I'd like to allow users to be able to drag and drop a large amount of images onto a page, and then navigate to other pages while the upload is going on in the background. I'd also like pictures to appear as they finish uploading. I don't want this process to lock up the Heroku dynos, so I probably need to move the work to a background job but I'm not sure what to use for my situation.

What's the best setup for this type of app? What background worker gem should I use? Is Cloudinary a good idea?

Jonathan Sutherland
  • 825
  • 1
  • 8
  • 16

2 Answers2

34

I recently built an application which accepts a large number of uploads on Heroku. I decided to build my own solution instead of using cloudinary or an equivalent. Here are some lessons I learned:

  • Don't upload to heroku. Your entire web worker will be locked down for the entire duration of the upload. That's up to a minute. Unacceptable.

  • Use a javascript uploader (like jquery-file-upload) to upload directly to s3. This is a little complicated at first, but once you get it working it's fantastic. You can use the s3_direct_upload gem, or you can just read their source to make your own solution from scratch. That gem was based on a railscasts pro episode, which you have to pay for, but has source available.

  • When the upload finishes, make an ajax request to your application passing the new s3 url as a remote url. Carrierwave will then process the image on s3 like it was uploaded, except in only a couple seconds instead of up to a minute.

  • Use jquery-file-upload's client-side image resizing. Somebody's going to try to upload a 5MB photo and then bitch that the upload takes forever. This will make all uploads as fast as theoretically possible.

  • Configure s3 to clear your uploads folder automatically.

  • Don't use thin. Use unicorn. A couple seconds is too long to be processing a request on thin, but unicorn with three or four workers is much more forgiving.

  • Don't use rmagick. It's a better API for complex image manipulation but uses amazing amounts of memory. Use mini_magick instead.

You'll note that I'm not using a background worker for any of this. If you're really feeling meticulous, you could have the controller that receives the remote url pass its work to a background worker, and if you need the result immediately the background worker could notify the UI by pubsub (faye or pusher, possibly with the exciting new sync gem). But this wasn't necessary for my application, and I'd rather spend my money on another web dyno than a worker dyno.

And, yeah, if you want to let them click around your whole application while that's happening, you're going to need to either be uploading in a popup (and using some kind of pubsub solution), or building your whole site as a javascript application using ember or backbone or angular or whatever.

Any questions?

Taavo
  • 2,406
  • 1
  • 17
  • 17
  • There are responsiveness issues with doing any processing, even just a few seconds, on a web dyno. Heroku's load balancing layer is no longer aware of whether or not each dyno is actually available to handle a request or not. Requests are randomly load balanced to all dynos, and requests from other users may end up waiting behind your image processing request job for 3-5 seconds, even though another dyno would have been able to handle the request immediately. It's always best to keep requests super-quick on heroku.[Source Article](http://rapgenius.com/James-somers-herokus-ugly-secret-lyrics) – Brian McKelvey May 03 '13 at 01:45
  • 1
    Agreed. That's why heroku [changed their official recommendation](https://blog.heroku.com/archives/2013/2/27/unicorn_rails) from thin to unicorn, where the problem is much less pronounced. The best possible architecture involves workers, but it may not be necessary for all applications and certainly won't have the same impact as, say, direct s3 uploads. – Taavo May 03 '13 at 03:58
6

I'd never seen Cloudinary before your mention, but it seems like it'd be a great fit for your project.

First and foremost, it could potentially greatly simplify your app. Cloudinary supports direct uploads from the browser via its HTTP API, and there's already a jquery plugin for it which is based on jQuery File Upload and has similar features, including client side pre-upload processing.

Furthermore, it supports on-the-fly transformations similar to dragonfly (also a very nice lib).

This means that, unless you really need to upload those images through your app, you can completely circumvent it, uploading straight to Cloudify and handling image cropping and other transformations through their transformation API.

You could eliminate Carrierwave and S3 from your app if desired, and there'd of course be no need for any background dynos to handle image processing. Additionally it'd likely be much faster (direct upload and on-the-fly manipulation vs uploading to your app, processing, then uploading to the cloud), and would eliminate the bandwidth for uploading through your app.

Even without direct upload, it seems that Cloudinary provides a Carrierwave plugin which could still make use of their transformation API, obviating the need for your app to process images.

numbers1311407
  • 33,686
  • 9
  • 90
  • 92
  • Would this allow users to post a group of photos that then get uploaded in the background (so they can leave the page without stopping the upload process)? – Jonathan Sutherland May 01 '13 at 06:36
  • I don't see this happening without opening a popup window to handle the upload. Otherwise when you leave the page, it's going to interrupt the upload. There's no backgrounding of processes in a web browser that I know of. – numbers1311407 May 01 '13 at 16:15
  • However, depending on what you're doing you still might add some kind of push component to the stack, like Faye or some WebSocket implementation, which would let you push notifications for uploaded images. – numbers1311407 May 01 '13 at 16:31