3

We have a number of Datadog monitors which work wonderfully for identifying key issues in the system. We also have them integrated to PagerDuty, for alerting our teams and organizing responses.

This all works great, but the problem we're running into is the monitors are all set up with rules similar to "If x logs appear over y duration, alert", which kicks off a PagerDuty alert. However, after y duration (which can be very short), both the monitor and the PagerDuty alert are resolved, even though there may not have been enough time to respond.

How can I configure a monitor which will not automatically resolve, and requires manual intervention to move it back to the 'OK' state?

Marisa
  • 732
  • 6
  • 22

1 Answers1

0

By default, Datadog monitors will not automatically resolve and will stay in their triggered state until manually resolved. If your monitors are resolving automatically, then they likely have recovery thresholds set.

To remove the recovery threshold, edit your monitor, then in the "Set alert conditions" section, open the "Advanced options" and delete the values for recovery thresholds.enter image description here

jyee
  • 11
  • 2
  • Is it possible to have a monitor that just alerts but never send a recovery alert? So it would just send an alert for an error but then wouldnt send a recovery alert. I have one that monitors every 5 minutes sends an alert and then proceeds to send a recovered. – user1555190 Apr 11 '23 at 10:51
  • @user155190 https://stackoverflow.com/questions/62345348/datadog-events-are-auto-recovered this could help you – Srinath Sureshkumar May 22 '23 at 07:57