Heroku, H12 and passthrough upload timeouts

Question

Overview:

I have a photobooth that takes pictures and sends them to my web application. Then my web application store users data and sends the picture to user facebook profile/fanpage.

My web app runs Ruby on Rails @ Heroku Cedar stack.

Flow:

My webapp receives the photo from the photobooth via a POST, like an web form.
The booth waits for the server response. If the upload has failed, it will send the picture again.
The response from webapp only will be fired after facebook upload has been completed.

Problems:

Webapp only sends data to photobooth after all processing has been completed. Many times this will happen after 30 secs. This causes to Heroku fire an H12 - Timeout.

Solutions?

Keep the request alive while file is being uploaded (return some response data in order to prevent heroku from firing a H12 - https://devcenter.heroku.com/articles/http-routing#timeouts). - Is it possible? how to achieve this in Ruby?

Change to Unicorn + Nginx and activate Upload Module (this way dyno only receives the request after the upload has been completed - Unicorn + Rails + Large Uploads). Is it really possible?

Use the rack-timeout gem. This would make a lot of my passthrough uploads to fail, so the pictures would never be posted on Facebook, right?

Change the architecture. Make the upload direct to S3, spin a worker to check new pictures uploaded to S3 bucket, download them and send them to Facebook. - This one might be the best one, but it takes a lot of time and effort. I might go for it in the long term, but I'm looking for a fast solution right now.

Other...

What part of this process is taking 30 seconds? How big are these photos? — willglynn, Sep 24 '12 at 16:25
The whole process. From picture upload (boot to webapp) till picture is posted on facebook. — Rafael Oliveira, Sep 24 '12 at 16:27
...sure, but where is all that time going? It's difficult to speed things up if you don't know why they're slow. Heroku servers are fast and well-connected; it shouldn't take 30 seconds to move a photo. — willglynn, Sep 24 '12 at 16:32
I think that it happens when the booth connection (it uses an mobile internet connection, and those in Brazil are not really reliable) is dropped. So the photo upload is stuck and the server keeps waiting the upload. Then the booth reconnects and ties to another dyno to send the photo, drops again and block this second dyno.. And so forth — Rafael Oliveira, Sep 24 '12 at 16:36

Jamie Folsom · Answer 1 · 2013-03-12T18:05:35.823

More info on this issue.

From Rapgenius: http://rapgenius.com/Lemon-money-trees-rap-genius-response-to-heroku-lyrics

Ten days ago, spurred by a minor problem serving our compiled javascript, we started running a lot of ab benchmarks. We noticed that the numbers we were getting were consistently worse than the numbers reported to us by Heroku and their analytics partner New Relic. For a static copyright page, for instance, Heroku reported an average response time of 40ms; our tools said 6330ms. What could account for such a big difference?

“Requests are waiting in a queue at the dyno level,” a Heroku engineer told us, “then being served quickly (thus the Rails logs appear fast), but the overall time is slower because of the wait in the queue.”

Waiting in a queue at the dyno level? What?

From Heroku: https://blog.heroku.com/archives/2013/2/16/routing_performance_update

Over the past couple of years Heroku customers have occasionally reported unexplained latency on Heroku. There are many causes of latency—some of them have nothing to do with Heroku—but until this week, we failed to see a common thread among these reports. We now know that our routing and load balancing mechanism on the Bamboo and Cedar stacks created latency issues for our Rails customers, which manifested themselves in several ways, including:

Unexplainable, high latencies for some requests

Mismatch between reported queuing and service time metrics and the observed reality

Discrepancies between documented and observed behaviors

For applications running on the Bamboo stack, the root cause of these issues is the nature of routing on the Bamboo stack coupled with gradual, horizontal expansion of the routing cluster. On the Cedar stack, the root cause is the fact that Cedar is optimized for concurrent request routing, while some frameworks, like Rails, are not concurrent in their default configurations.

We want Heroku to be the best place to build, deploy and scale web and mobile applications. In this case, we’ve fallen short of that promise. We failed to:

Properly document how routing works on the Bamboo stack

Understand the service degradation being experienced by our customers and take corrective action

Identify and correct confusing metrics reported from the routing layer and displayed by third party tools

Clearly communicate the product strategy for our routing service

Provide customers with an upgrade path from non-concurrent apps on Bamboo to concurrent Rails apps on Cedar

Deliver on the Heroku promise of letting you focus on developing apps while we worry about the infrastructure

We are immediately taking the following actions:

Improving our documentation so that it accurately reflects how our service works across both Bamboo and Cedar stacks

Removing incorrect and confusing metrics reported by Heroku or partner services like New Relic

Adding metrics that let customers determine queuing impact on application response times

Providing additional tools that developers can use to augment our latency and queuing metrics

Working to better support concurrent-request Rails apps on Cedar

The remainder of this blog post explains the technical details and history of our routing infrastructure, the intent behind the decisions we made along the way, the mistakes we made and what we think is the path forward.

I've had some good responses from heroku about this issue, which at the very least cost us some time and money, if not downtime. They have indicated that they "will not be fixing their router" because they "feel very good about the technical decisions they made in designing it". However, they'll be offering dynos with twice the memory, for twice the price, and they are now recommending Unicorn as the rails app server on cedar. Unicorn essentially routes requests internally, handling 2-4 requests simultaneously on a single dyno. FWIW. — Jamie Folsom, Apr 02 '13 at 21:29

Thomas Klemm · Answer 2 · 2012-09-24T16:53:08.290

0

1) You can use Unicorn as your app server and set the timeout before the unicorn master kills a worker to a number of seconds that is greater than your requests need. Here is some example setup where you can see a timeout of 30 seconds.

Nginx does not work on heroku, so that is no option.

2) Changing the architecture would work well too, though I would choose an option than where the upload traffic does not block my own server, such as TransloadIt. They will help you get the pictures to S3 for examples and do custom transformations, cropping etc. without you having to add additional dynos because your processes are being blocked by file uploads.

Addition: 3) Another change of architecture would be to only handle the receiving part in one action, and giving the uploading to facebook task to a background worker (using for example Sidekiq).

edited Sep 24 '12 at 16:53

answered Sep 24 '12 at 16:21

Thomas Klemm

10,678
1
51
54

I would have to set Unicorn timeout to something like 120 seconds, otherwise I would have the same problem that rack-timeout would give me, am I right? For direct upload I was thinking of CarrierWeave Direct. I've used in the past. – Rafael Oliveira Sep 24 '12 at 16:32
1

I would not compare rack-timeout with the timeout of unicorn workers. Unicorn works like this: On one single heroku dyno there is one master process and three to four worker processes. The master subprocess does not ever handle requests, he only supervises the worker subprocesses kills them if they are stuck, that is, they finish responding to a web request not in the specified timeout. So a 120 seconds timeout could work well. – Thomas Klemm Sep 24 '12 at 16:41
About your edit.. To make this happen I would still need to save the file somewhere else then heroku, because heroku doesn't allow me to access the filesystem. – Rafael Oliveira Sep 24 '12 at 17:17
That's certainly right, you would need to store it somewhere for the time being. As you do not have file system access, you could for example write the file to Amazon's S3 using the fog library as described [here](http://fog.io/0.8.1/storage/). That being said, this is in all a pretty tedious way for a task where there must be a cleaner and nicer solution as a developer will stumble upon it sometime. Right now you are saving the object in memory and sending it to Facebook in the same action using Koala? – Thomas Klemm Sep 24 '12 at 18:29
Are you using any gem to help you with uploading such as [dragonfly](http://markevans.github.com/dragonfly/file.README.html), paperclip or carrierwave by the way, then it might ship with an option of storing the files on S3. One option recommended in [this thread](http://stackoverflow.com/questions/6127079/rails-direct-upload-to-amazon-s3) is [CarrierWave Direct](https://github.com/dwilkie/carrierwave_direct). Your background worker could upload the file to facebook then reading it from S3. – Thomas Klemm Sep 24 '12 at 18:49
Yep, I'm saving in memory and sending it to Facebook, using Koala. I think I will have to go for the CarrierWeave solution, but was hoping to find a simpler one in the mean time. – Rafael Oliveira Sep 24 '12 at 19:30
2

Heroku does let you access the filesystem -- it just doesn't guarantee that things you write to the filesystem will stick around for any particular period of time. Files stay alive on the dyno's filesystem for as long as the dyno stays alive, which is generally on the order of 24 hours. – willglynn Sep 24 '12 at 20:23
1

I did not know that. However, even if you were to write a file to a dyno's file system, you would not be able to access it from a background worker process which usually runs in a seperate dyno. – Thomas Klemm Sep 24 '12 at 20:33
From the [Heroku Docs](https://devcenter.heroku.com/articles/dynos#ephemeral-filesystem): `**Ephemeral filesystem** (*Heroku Cedar stack only*) Each dyno gets its own ephemeral filesystem, with a fresh copy of the most recently deployed code. During the dyno’s lifetime its running processes can use the filesystem as a temporary scratchpad, but no files that are written are visible to processes in any other dyno and any files written will be discarded the moment the dyno is stopped or restarted.` – Thomas Klemm Sep 24 '12 at 20:34
@ThomasKlemm If I set my unicorn timeout value high (such as 120 seconds), but set my rack-timeout gem to a lower value like 30 seconds, during long requests, how would the rack-timeout gem affect the unicorn process? Would heroku/rack-timeout "kill" the request for the user, while the Unicorn dyno still works to process? Or would the rack-timeout gem also kill the work being done for that request by the unicorn process? – Kelsey Hannan Jan 19 '15 at 20:36
@Kelseydh Not sure, I'd need to research that, too. – Thomas Klemm Jan 19 '15 at 21:33

Heroku, H12 and passthrough upload timeouts

2 Answers2