0

I'm trying to write a log-based alert policy in terraform.

I want to generate an alert, in near real time, whenever a certain message appears in the logs. Specifically, I want to know when a Composer DAG fails.

I managed to successfully set up a log-based alert in the console with the following query filter:

resource.type="cloud_composer_environment"
severity="ERROR"
log_name="projects/my_project/logs/airflow-scheduler"
resource.labels.project_id="project-id"
textPayload=~"my_dag_name"

But, I am having trouble translating this log-based alert policy into terraform as a "google_monitoring_alert_policy".

I have tried to add the following filter conditions to the terraform google_monitoring_alert_policy:

filter = "resource.type=cloud_composer_environment AND resource.label.project_id=${var.project} AND log_name=projects/${var.project}/logs/airflow-scheduler AND severity=ERROR AND textPayload=~my_dag_name"

But when running terraform apply, I get the following error:

build   10-Nov-2022 12:21:00    [31mâ[0m [0m[1m[31mError: [0m[0m[1mError creating AlertPolicy: googleapi: Error 400: Field alert_policy.conditions[0].condition_threshold.filter had an invalid value of "resource.type=cloud_composer_environment AND resource.labels.project_id=my_project AND log_name=projects/my_project/logs/airflow-scheduler AND severity=ERROR AND textPayload=my_dag_name": The lefthand side of each expression must be prefixed with one of {group, metadata, metric, project, resource}.[0m

So I have two questions:

  1. Can "log-based" alerts be configured in terraform at all?

  2. How do I set up an alert in terraform that filters for a particular string in the log 'textPayload' field?

femeloper
  • 96
  • 7

1 Answers1

2

As I see you want to create a log based metric.

In this case you firstly need to create this log based metric with Terraform :

Example with metrics configured in a json file, logging_metrics.json :

{
    "metrics": { 
        "composer_dags_tasks_bigquery_errors": {
            "name": "composer_dags_tasks_bigquery_errors",
            "filter": "severity=ERROR AND resource.type=\"cloud_composer_environment\" AND textPayload =~ \"{taskinstance.py:.*} ERROR -.*bigquery.googleapis.com/bigquery/v2/projects\"",
            "description": "Metric for Cloud Composer Bigquery tasks errors.",
            "metric_descriptor": {
                "metric_kind": "DELTA",
                "value_type": "INT64",
                "labels": [
                    {
                        "key": "task_id",
                        "value_type": "STRING",
                        "description": "Task ID of current Airflow task",
                        "extractor": "EXTRACT(labels.\"task-id\")"
                    },
                    {
                        "key": "execution_date",
                        "value_type": "STRING",
                        "description": "Execution date of the current Airflow task",
                        "extractor": "EXTRACT(labels.\"execution-date\")"
                    }
                ]
            }
        }
    }
}

This metric filters BigQuery errors in Composer log. I used label extractor on DAG task_id and task execution_date to make this metric unique based on these parameters.

Retrieve the metric in locals.tf file :

locals {
  logging_metrics = jsondecode(file("${path.module}/resource/logging_metrics.json"))["metrics"]
}
resource "google_logging_metric" "logging_metrics" {
  for_each = local.logging_metrics
  project = var.project_id
  name = "${each.value["name"]}"
  filter = each.value["filter"]
  description = each.value["description"]
  metric_descriptor {
    metric_kind = each.value["metric_descriptor"]["metric_kind"]
    value_type = each.value["metric_descriptor"]["value_type"]

    dynamic "labels" {
      for_each = try(each.value["metric_descriptor"]["labels"], [])
      content {
        key = try(labels.value["key"], null)
        value_type = try(labels.value["value_type"], null)
        description = try(labels.value["description"], null)
      }
    }
  }

  label_extractors = {for label in try(each.value["metric_descriptor"]["labels"], []): label.key => label.extractor}
}

Then create the alerting resource based on the previous log based metric :

resource "google_monitoring_alert_policy" "alert_policy" {
  project = var.project_id
  display_name = "alert_name"
  combiner = "..."
  conditions {
    display_name = "alert_name"
    condition_threshold {
      filter = "metric.type=\"logging.googleapis.com/user/composer_dags_tasks_bigquery_errors\" AND resource.type=\"cloud_composer_environment\""
      ...........
}

The alerting policy resource uses the previous created log based metric via metric.type.

Mazlum Tosun
  • 5,761
  • 1
  • 9
  • 23
  • I tried this but it didn't make a difference, so this isn't the answer to the question Im afraid to say. I think that there needs to be some configuration with the "labels" but I can't get it working – femeloper Nov 10 '22 at 16:22
  • Sorry I am going to edit my answer, I undestood the problem – Mazlum Tosun Nov 10 '22 at 16:40
  • I edited my answer to help you in another direction. – Mazlum Tosun Nov 10 '22 at 16:59
  • GCP documentation says there are 2 ways to set up alerting policies: 1. metric-based or 2. log-based. I set up a log-based alert policy in the console that generated the alerts as I expected. I want to translate this into terraform but I'm having trouble because it does not allow me to add a filter on "textPayload". It looks like I need to set up a "metric-based" alert with a metric that has a label and label extractor expression, and then a corresponding alert policy. This approach requires configuring 2 resources in terraform than simply a "log-based" alert policy. – femeloper Nov 14 '22 at 11:33
  • I think this thread answers my question and it seems that it is not yet possible to set up a log-based alert policy in terraform: https://stackoverflow.com/questions/68938876/terraform-google-provider-create-log-based-alerting-policy – femeloper Nov 14 '22 at 11:38
  • 1
    Yes, I also edited this thread to orient you in this direction. Create log based metric, then create alerting policy based on this log based metric. – Mazlum Tosun Nov 14 '22 at 11:45