20

I just upgraded to celery 3.1 and now I see this i my logs ::

on_node_lost - INFO - missed heartbeat from celery@queue_name for every queue/worker in my cluster.

According to the docs BROKER_HEARTBEAT is off by default and I haven't configured it.

Should I explicitly set BROKER_HEARTBEAT=0 or is there something else that I should be checking?

Douglas Ferguson
  • 1,242
  • 2
  • 14
  • 25

4 Answers4

16

Celery 3.1 added in the new mingle and gossip procedures. I too was getting a ton of missed heartbeats and passing --without-gossip to my workers cleared it up.

https://docs.celeryproject.org/en/3.1/whatsnew-3.1.html#mingle-worker-synchronization

Mingle: Worker synchronization

The worker will now attempt to synchronize with other workers in the same cluster.

Synchronized data currently includes revoked tasks and logical clock.

This only happens at startup and causes a one second startup delay to collect broadcast responses from other workers.

You can disable this bootstep using the --without-mingle argument.

https://docs.celeryproject.org/en/3.1/whatsnew-3.1.html#gossip-worker-worker-communication

Gossip: Worker <-> Worker communication

Workers are now passively subscribing to worker related events like heartbeats.

This means that a worker knows what other workers are doing and can detect if they go offline. Currently this is only used for clock synchronization, but there are many possibilities for future additions and you can write extensions that take advantage of this already.

Some ideas include consensus protocols, reroute task to best worker (based on resource usage or data locality) or restarting workers when they crash.

We believe that although this is a small addition, it opens amazing possibilities.

You can disable this bootstep using the --without-gossip argument.

Trevor Boyd Smith
  • 18,164
  • 32
  • 127
  • 177
user3204501
  • 161
  • 2
  • 1
    just a quick note FYI there is a commercial operator of rabbitmq they have recommended settings for `celery`: https://www.cloudamqp.com/docs/celery.html#commandline-arguments . they also recommend `--without-gossip`. i'm not advocating either way. i'm just offering more information that can inform your decision. – Trevor Boyd Smith Jan 31 '22 at 18:46
  • for more information on the consequences of turning off `gossip` or `mingle` or `heartbeat` please see: [what-are-the-consequences-of-disabling-gossip-mingle-and-heartbeat-for-celery-w](https://stackoverflow.com/questions/55249197) and [application-impacts-of-celery-workers-running-with-the-without-heartbeat-fla](https://stackoverflow.com/questions/66978028) – Trevor Boyd Smith Jan 31 '22 at 18:49
11

Saw the same thing, and noticed a couple of things in the log files.

1) There were messages about time drift at the start of the log and occasional missed heartbeats.

2) At the end of the log file, the drift messages went away and only the missed heartbeat messages were present.

3) There were no changes to the system when the drift messages went away... They just stopped showing up.

I figured that the drift itself was likely the problem itself.

After syncing the time on all the servers involved these messages went away. For ubuntu, run ntpdate as a cron or ntpd.

user3691996
  • 111
  • 1
  • 3
1

I'm having a similar issue. I have found the reason in my case.

I have two server to run worker.

when I use "ping" to another server, I found when the ping time larger than 2 second, the log will show " missed heartbeat from celery@ ". The default heartbeat interval is 2 second.

The reason is my poor network. http://docs.celeryproject.org/en/latest/internals/reference/celery.worker.heartbeat.html

mutex86
  • 59
  • 6
-1

add --without-mingle when you start celery

Flora
  • 1
  • 2
    Please add further details to expand on your answer, such as working code or documentation citations. – Community Aug 30 '21 at 14:00
  • If you have a new question, please ask it by clicking the [Ask Question](https://stackoverflow.com/questions/ask) button. Include a link to this question if it helps provide context. – Mousam Singh Aug 30 '21 at 15:31