How do I clear stuck/stale Resque workers?

Question

As you can see from the attached image, I've got a couple of workers that seem to be stuck. Those processes shouldn't take longer than a couple of seconds.

enter image description here

I'm not sure why they won't clear or how to manually remove them.

I'm on Heroku using Resque with Redis-to-Go and HireFire to automatically scale workers.

Hi, semi-related question: how did you get the resque-web dashboard via heroku? I can't seem to figure out how to open it. — Aaron Marks, Mar 05 '14 at 15:45

score 226 · Accepted Answer · answered May 16 '12 at 21:51

226

None of these solutions worked for me, I would still see this in redis-web:

0 out of 10 Workers Working

Finally, this worked for me to clear all the workers:

Resque.workers.each {|w| w.unregister_worker}

answered May 16 '12 at 21:51

hagope

5,523
7
38
52

12

This worked for me. It unregistered *all* workers which was a bit annoying. But this followed by `heroku restart` seemed to do the trick. It now shows the correct number of workers. – Brian Armstrong Aug 14 '12 at 05:27
This took out the workers from the web interface, but actually they still show up as processes and also "stole" jobs from the queue – txwikinger Sep 05 '13 at 23:15
21

If you want to unregister only the workers that are not actual processes (and perhaps processing jobs), you might want to try `Resque.workers.each {|w| matches = w.id.match(/^[^:]*:([0-9]*):[^:]*$/); pid = matches[1]; w.unregister_worker unless w.worker_pids.include?(pid.to_s)}` which will only unregister those workers which pids are not part of the known running pids. I do not know if this works in all environment but it works good on ubuntu. This might only work when your workers are on the same machine that you run this code on. – roychri Sep 25 '13 at 18:01
3

As an option Resque.workers.map &:unregister_worker – A B Apr 23 '14 at 05:04
How come this doesn't include a check for whether the worker *should* be unregistered before calling `unregister_worker`? Is there a way to determine this? – boo-urns May 10 '15 at 04:51
1

Be aware that this does not get rid of workers processes. – Matheus Santana Apr 28 '17 at 18:38
Here's one way that *seems* to work to remove only actually appearantly-dead workers in resque 2.0.0: `Resque::Worker.all_workers_with_expired_heartbeats.each { |w| w.unregister_worker }` – jrochkind Jun 09 '21 at 15:58

Simpleton · Answer 2 · 2013-04-19T08:16:16.073

54

In your console:

queue_name = "process_numbers"
Resque.redis.del "queue:#{queue_name}"

Otherwise you can try to fake them as being done to remove them, with:

Resque::Worker.working.each {|w| w.done_working}

EDIT

A lot of people have been upvoting this answer and I feel that it's important that people try hagope's solution which unregisters workers off a queue, whereas the above code deletes queues. If you're happy to fake them, then cool.

edited Apr 19 '13 at 08:16

answered Sep 28 '11 at 11:19

Simpleton

6,285
11
53
87

3

If he does this it will delete the whole queue, he just wants to remove the stuck ones.. – jBeas Sep 28 '11 at 15:02
1

Small update: You now have to use Resque.redis.del instead of Resque.redis.delete – James P McGrath Nov 03 '11 at 04:09
1

There's actually a Resque.remove_queue() method now – iainbeeston May 16 '13 at 06:44

score 30 · Answer 3 · edited Aug 22 '13 at 12:41

30

You probably have the resque gem installed, so you can open the console and get current workers

Resque.workers

It returns a list of workers

#=> [#<Worker infusion.local:40194-0:JAVA_DYNAMIC_QUEUES,index_migrator,converter,extractor>]

pick the worker and prune_dead_workers, for example the first one

Resque.workers.first.prune_dead_workers

edited Aug 22 '13 at 12:41

tmr08c

77
1
8

answered Sep 15 '11 at 03:23

Shairon Toledo

2,024
16
18

1

Actually, on second try, this didn't do anything. – Shpigford Sep 23 '11 at 18:31
2

This works great for clearing out resque workers who were killed off without unregistering. – Lukas Eklund Jun 25 '12 at 15:19
3

This seems like the new best answer since it doesn't unregister all of them. Shouldn't prune_dead_workers be a class method? But in any event, great solution! Thanks. – Brian Armstrong Jan 28 '13 at 02:32
That's definitely the solution for killed -9 workers. The only thing i would add is that you need to do that on same server where you killed with -9. – Stanislav O. Pogrebnyak Mar 27 '13 at 19:59
Do it to all of them at once: Resque.workers.each(&:prune_dead_workers) – Leo May 05 '15 at 23:36
Is there anything to add in JAVA for workers not to get into this state? – Arian Faurtosh Jun 25 '19 at 23:22

score 27 · Answer 4 · answered Feb 04 '14 at 18:29

27

Adding to answer by hagope, I wanted to be able to only unregister workers that had been running for a certain amount of time. The code below will only unregister workers running for over 300 seconds (5 minutes).

Resque.workers.each {|w| w.unregister_worker if w.processing['run_at'] && Time.now - w.processing['run_at'].to_time > 300}

I have an ongoing collection of Resque related Rake tasks that I have also added this to: https://gist.github.com/ewherrmann/8809350

answered Feb 04 '14 at 18:29

ewH

2,553
2
20
13

3

Points for showing how to access the job start time via processing['run_at']. I've seen other solutions that are using the .started method, but this actually returns the time the *worker* was started, not the job, which is the wrong approach for clearing stuck workers. Thanks! – Lachlan Cotter Mar 24 '14 at 02:06

score 10 · Answer 5 · answered Sep 27 '11 at 19:41

Run this command wherever you ran the command to start the server

$ ps -e -o pid,command | grep [r]esque

you should see something like this:

92102 resque: Processing ProcessNumbers since 1253142769

Make note of the PID (process id) in my example it is 92102

Then you can quit the process 1 of 2 ways.

Gracefully use QUIT 92102
Forcefully use TERM 92102

* I'm not sure of the syntax it's either QUIT 92102 or QUIT -92102

Let me know if you have any trouble.

In the Linux console: kill -SIGQUIT 92102 – Alexey Jul 16 '12 at 11:39 — Alexey, Jul 16 '12 at 11:39

score 7 · Answer 6 · answered Sep 24 '13 at 14:47

7

I just did:

% rails c production
irb(main):001:0>Resque.workers

Got the list of workers.

irb(main):002:0>Resque.remove_worker(Resque.workers[n].id)

... where n is the zero based index of the unwanted worker.

answered Sep 24 '13 at 14:47

user2811637

71
1
1

score 2 · Answer 7 · answered Jan 15 '13 at 11:39

I had a similar problem that Redis saved the DB to disk that included invalid (non running) workers. Each time Redis/resque was started they appeared.

Fix this using:

Resque::Worker.working.each {|w| w.done_working}
Resque.redis.save # Save the DB to disk without ANY workers

Make sure you restart Redis and your Resque workers.

Shai · Answer 8 · 2014-01-09T23:17:29.943

Started working on https://github.com/shaiguitar/resque_stuck_queue/ recently. It's not a solution to how to fix stuck workers but it addresses the issue of resque hanging/being stuck, so I figured it could be helpful for people on this thread. From README:

"If resque doesn't run jobs within a certain timeframe, it will trigger a pre-defined handler of your choice. You can use this to send an email, pager duty, add more resque workers, restart resque, send you a txt...whatever suits you."

Been used in production and works pretty well for me thus far.

score 2 · Answer 9 · answered Jun 08 '14 at 01:49

2

Here's how you can purge them from Redis by hostname. This happens to me when I decommission a server and workers do not exit gracefully.

Resque.workers.each { |w| w.unregister_worker if w.id.start_with?(hostname) }

answered Jun 08 '14 at 01:49

Rich Sutton

10,004
1
17
21

score 2 · Answer 10 · answered Jul 12 '17 at 21:08

I ran into this issue and started down the path of implementing a lot of the suggestions here. However, I discovered the root cause that was creating this issue was that I was using the gem redis-rb 3.3.0. Downgrading to redis-rb 3.2.2 prevented these workers from getting stuck in the first place.

score 1 · Answer 11 · answered Sep 27 '13 at 09:50

I've cleared them out from redis-cli directly. Luckily redistogo.com allows access from environments outside heroku. Get dead worker ID from the list. Mine was

55ba6f3b-9287-4f81-987a-4e8ae7f51210:2

Run this command in redis directly.

del "resque:worker:55ba6f3b-9287-4f81-987a-4e8ae7f51210:2:*"

You can monitor redis db to see what it's doing behind the scenes.

redis xxx.redistogo.com> MONITOR
OK
1380274567.540613 "MONITOR"
1380274568.345198 "incrby" "resque:stat:processed" "1"
1380274568.346898 "incrby" "resque:stat:processed:c65c8e2b-555a-4a57-aaa6-477b27d6452d:2:*" "1"
1380274568.346920 "del" "resque:worker:c65c8e2b-555a-4a57-aaa6-477b27d6452d:2:*"
1380274568.348803 "smembers" "resque:queues"

Second last line deletes the worker.

Not a good idea. This won't call unregister hooks in Resque, not calling failure and possible clean up code people may have. — Jeremy, Feb 08 '16 at 06:49
This was useful with resque 2 years ago when it was showing stuck jobs that were impossible to delete using the interface and there was no clean way to do it in rails — Andrei R, Feb 10 '16 at 12:08

score 1 · Answer 12 · answered Jun 09 '21 at 16:02

In resque 2.0.0, here's one way that seems to work to remove only actually appearantly-dead workers in resque 2.0.0:

Resque::Worker.all_workers_with_expired_heartbeats.each { |w| w.unregister_worker }

I am not an expert in what's going, it's possible there's a better way to do this or that this will have problems. I'm just trying to figure this out too.

This seems to remove workers that haven't sent a "heartbeat" in much longer than expected from the resque worker list.

If the phantom worker was in a "running" state, then a new entry in the "failed" job queue will be created corresponding to phantom job.

jobwat · Answer 13 · 2013-06-06T05:55:01.490

I had stuck/stale resque workers here too, or should I say 'jobs', because the worker is actually still there and running fine, it's the forked process that is stuck.

I chose the brutal solution of killing the forked process "Processing" since more than 5min, via a bash script, then the worker just spawn the next in queue, and everything keeps on going

have a look at my script here: https://gist.github.com/jobwat/5712437

score 0 · Answer 14 · answered May 18 '15 at 16:48

0

If you are using newer versions of Resque, you'll need to use the following command as the internal APIs have changed...

Resque::WorkerRegistry.working.each {|work| Resque::WorkerRegistry.remove(work.id)}

answered May 18 '15 at 16:48

lloydpick

1,638
1
18
24

Joakim Kolsjö · Answer 15 · 2016-09-05T14:23:19.267

0

This avoids the problem as long as you have a resque version newer than 1.26.0:

resque: env QUEUE=foo TERM_CHILD=1 bundle exec rake resque:work

Keep in mind that it does not let the currently running job finish.

edited Sep 05 '16 at 14:23

answered Aug 31 '16 at 15:04

Joakim Kolsjö

106
1
5

score 0 · Answer 16 · answered Oct 06 '20 at 19:02

0

If you use Docker, you can also use this command:

<id> is the worker id.

docker stop <id>

docker start <id>

answered Oct 06 '20 at 19:02

Sandip Subedi

1,039
1
14
34

How do I clear stuck/stale Resque workers?

16 Answers16

Linked