0

We are receiving a lot of Kafka Confluent Control Center alerts related to topic being under replicated. We think these are not real issues because of the alerts bouncing in and off. This may be caused by having a tight value for replica.lag.time.max.ms. This setting controls when a replica is considered out of sync and thus removed from the In-Sync replicas list.

We could relax this value and received less alerts, but how do we guarantee this not becomes an issue of hiding real problems.

Is there an expected normal # of these alerts we can target to? Or are there any other metrics we can also use to assess the quality of our replicas after relaxing the setting?

  • What tool are you using to alert? You should add a tag for that... In any case, could you set a time-window for the actual alert trigger? For example, in AlertManager, a metric might start in an alert state, but the actual alert message might not get triggered unless that condition holds for some amount of time – OneCricketeer Feb 16 '22 at 20:30
  • 1
    The alerts are trigger from Confluent Control Center. These alerts don't have this option to hold the condition for an amount of time before doing the trigger. – Gabriel Solano Feb 17 '22 at 00:11

0 Answers0