Rails & Heroku: How many workers/dynos do I need

Question

I have a tinder style app that allows users to rate events. After a user rates an Event, a background resque job runs that re-ranks the other events based on user's feedback.

This background job takes about 10 seconds and it runs about 20 times a minute per user.

Using a simple example. If I have 10 users using the app at any given time, and I never want a job to be waiting, what's the optimal way to do this?

I'm confused about Dynos, resque pools, and redis connections. Can someone help me understand the difference? Is there a way to calculate this?

Why does the job run 20 times per minute, per user? Why not only run when the user ranks an event? — toddmetheny, Oct 15 '16 at 16:09
That's exactly what it does, the user is "ranking" 20 events per minute by swiping through them ("liking" or "disliking") — Jackson Cunningham, Oct 18 '16 at 22:48

toddmetheny · Accepted Answer · 2016-10-18T20:54:25.587

Not sure you're asking the right question. Your real question is "how can I get better performance?" Not "how many dynos?" Just adding dynos won't necessarily give you better performance. More dynos give you more memory...so if your app is running slowly because you're running out of available memory (i.e. you're running on swap), then more dynos could be the answer. If those jobs take 10 seconds each to run, though...memory probably isn't your actual problem. If you want to monitor your memory usage, check out a visualization tool like New Relic.

There are a lot of approaches to solving your problem. But I would start with the code that your wrote. Posting some code on SO might help understand why that job takes 10 seconds (Post some code!). 10 seconds is a long time. So optimizing the queries inside that job would almost surely help.

Another piece of low hanging fruit...switch from resque to sidekiq for your background jobs. Really easy to use. You'll use less memory and should see an instant bump in performance.

I'm going to post another question on SO with more details. Thanks! — Jackson Cunningham, Oct 18 '16 at 17:51
Here's a more thorough post of the issue: http://stackoverflow.com/questions/40115387/rails-heroku-and-resque-long-running-background-job-optimization/40115470#40115470 — Jackson Cunningham, Oct 18 '16 at 19:12

score 0 · Answer 2 · answered Oct 13 '16 at 17:51

0

Dynos: These are individual virtual/physical servers. Think of them as being the same as EC2 instances.

Redis Connections: Individual connections to the Redis Instance.

Resque Pool: A gem that allows you to run workers concurrently on the same dyno/instance.

answered Oct 13 '16 at 17:51

Dean Galvin

169
1
11

So if I'm finding that I have too many jobs in my queue, do I need more dynos or redis connections or something else? – Jackson Cunningham Oct 13 '16 at 21:19
Correct you need to up the dyno count on that worker if the jobs are backing up. You also need to make sure you have a redis instance that can handle the number of workers you have. – Dean Galvin Oct 13 '16 at 23:30

score 0 · Answer 3 · edited May 23 '17 at 12:33

First of all, it’s worth looking for ways in which you can improve the performance of the job itself. You might be able to get it below ten seconds by using low level model caching or optimizing your algorithm.

In terms of working out how many workers you would need, you’ll need to take the number runs per minute (20) times the number of seconds it takes to run (10) times the number of users (10). That will give you the number of seconds per minute it would take to run on one worker. 20 * 10 * 10 = 2000. Divide that by 60 and you have the number of minutes per minute, 33.3. So if you had 34 workers, and these numbers were all consistent, they should be able to keep on top of things.

That said, you shouldn’t be in a position where you need to run 36 or more dynos for just 10 concurrent users for a ranking algorithm. That’s going to get expensive very quickly.

Optimise your algorithm, try to add more caching, and give Sidekiq a try too. In my experience, Sidekiq can process a queue up to 10 times faster than Resque. It depends what your job is doing, and how you utilize each tool, but it's worth checking out. See Sidekiq vs Resque.

Thanks, I'll take a look. – Jackson Cunningham Oct 14 '16 at 20:51 — Jackson Cunningham, Oct 14 '16 at 20:51

score 0 · Answer 4 · answered Oct 18 '16 at 17:45

Re-ranking other events is a bad idea.

You should consider having total_points and average_points columns for events table and let the ranks be decided by order by queries. Like this.

class Event
    has_many :feedbacks

    scope :rank_by_total, -> { order(:total_points) }
    scope :rank_by_average, -> { order(:average_points) }
end

class Feedback
    belongs_to :event
    after_create :update_points

    def update_points
        total = event.feedbacks.sum(:points)
        avg = event.feedbacks.average(:points)
        event.update(total_points: total, average_points: avg)
    end
end

So, How many workers/dynos do you need?

You don't need to worry about dyno or worker for this problem. No matter how many dynos with higher processing power you use, your solution will take good amount of time when your events table becomes huge. So try changing your solution the way I have described.

But our event rankings are personalized to each user. It's not popularity, it's relevance to specific user based on the past events they've liked — Jackson Cunningham, Oct 18 '16 at 17:48
So my idea was: Event has_many event_rankings and event_ranking belongs_to user. So each user has their own event rankings... — Jackson Cunningham, Oct 18 '16 at 17:49

Rails & Heroku: How many workers/dynos do I need

4 Answers4