I'm trying to create an alerting policy for a Gauge type metric in Google Cloud Platform that is expected to trigger when:
- the gauge's value is below 1
- data is absent
Using Terraform, I came up with the following definition:
resource "google_monitoring_alert_policy" "alert_policy_prometheus_metric" {
display_name = "Metric check failed"
conditions {
display_name = "Metric violation"
condition_threshold {
filter = "resource.type = \"prometheus_target\" AND resource.labels.cluster = \"${var.cluster_name}\" AND metric.type = \"prometheus.googleapis.com/elasticsearch_cluster_health_status/gauge\" AND metric.labels.color = \"green\""
evaluation_missing_data = "EVALUATION_MISSING_DATA_ACTIVE"
comparison = "COMPARISON_LT"
duration = "60s"
trigger {
count = 1
}
threshold_value = 1
}
}
conditions {
display_name = "Metric absent"
condition_absent {
duration = "120s"
filter = "resource.type = \"prometheus_target\" AND resource.labels.cluster = \"${var.cluster_name}\" AND metric.type = \"prometheus.googleapis.com/elasticsearch_cluster_health_status/gauge\" AND metric.labels.color = \"green\""
}
}
combiner = "OR"
notification_channels = [
"${var.monitoring_email_group_name}"
]
}
This does work, however it creates two separate incidents when the following happens:
- the metric starts being absent
- the metric then reappears in a below 1 state
This is surprising to me as I use OR
as the combiner for the conditions. Is there anything I can do to merge the two conditions into a single incident?