1

I have a spring boot application which is deployed on AWS ECS Fargate cluster. As a sidecar container i have deployed "adot-collector" container to scrape metrics from Amazon Elastic Container Service (Amazon ECS) and ingest them into Amazon Managed Service for Prometheus using AWS Distro for Open Telemetry (ADOT).

I have an API "/actuator/prometheus" exposed on port 8080 on my spring app which exposes my java pplication metrics and i want ADOT to scrape this API for metrics. Below is the adot collector config i have in place.

adot.config.yaml

receivers:
  prometheus:
    config:
      global:
        scrape_interval: 15s
        scrape_timeout: 10s
      scrape_configs:
      - job_name: "prometheus"
        static_configs:
        - targets: [ 0.0.0.0:9090 ]
      - job_name: "my-spring-app"
        metrics_path: "actuator/prometheus"
        static_configs:
        - targets: [ 0.0.0.0:8080 ]

  awsecscontainermetrics:
    collection_interval: 10s
processors:
  filter:
    metrics:
      include:
        match_type: strict
        metric_names:
          - ecs.task.memory.utilized
          - ecs.task.memory.reserved
          - ecs.task.cpu.utilized
          - ecs.task.cpu.reserved
          - ecs.task.network.rate.rx
          - ecs.task.network.rate.tx
          - ecs.task.storage.read_bytes
          - ecs.task.storage.write_bytes
exporters:
  prometheusremotewrite:
    endpoint: https://xxx/remote_write
    auth:
      authenticator: sigv4auth
  logging:
    loglevel: info
extensions:
  health_check:
  pprof:
    endpoint: :1888
  zpages:
    endpoint: :55679
  sigv4auth:
    region: us-west-2
    service: aps
    assume_role:
      arn:
      sts_region: eu-west-2
service:
  extensions: [pprof, zpages, health_check, sigv4auth]
  telemetry:
    logs:
      level: debug
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [logging, prometheusremotewrite]
    metrics/ecs:
      receivers: [awsecscontainermetrics]
      processors: [filter]
      exporters: [logging, prometheusremotewrite]

And ECS task definition looks like this

task definition:

{
  "family": "adot-prom",
  "networkMode": "awsvpc",
  "containerDefinitions": [
    {
      "name": "adot-collector",
      "image": "account_id.dkr.ecr.region.amazonaws.com/image-tag",
      "essential": true,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/ecs-adot-collector",
          "awslogs-region": "my-region",
          "awslogs-stream-prefix": "ecs",
          "awslogs-create-group": "True"
        }
      }
    },
    {
      "name": "prometheus",
      "image": "prom/prometheus:main",
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/ecs-prom",
          "awslogs-region": "my-region",
          "awslogs-stream-prefix": "ecs",
          "awslogs-create-group": "True"
        }
      }
    },
{
      "name": "my-spring-app",
      "image": "ecr repo url",
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/ecs-app",
          "awslogs-region": "my-region",
          "awslogs-stream-prefix": "app",
          "awslogs-create-group": "True"
        }
        ,
         "portMappings": [{
         "containerPort": 8080
       }]
      }
    }
  ],
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "cpu": "1024"
}

But not sure why i always receive the following error while prometheus is trying to scrape the metrics from my /actuator/prometheus endpoint even though the endpoint exists.

Error:

debug scrape/scrape.go:1353 Scrape failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "prometheus", "target": "http://0.0.0.0:8080/actuator/prometheus", "error": "server returned HTTP status 404 Not Found", "errorVerbose": "server returned HTTP status 404 Not Found\ngithub.com/prometheus/prometheus/scrape.(*targetScraper).scrape\n\tgithub.com/prometheus/prometheus@v0.43.0/scrape/scrape.go:817\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport\n\tgithub.com/prometheus/prometheus@v0.43.0/scrape/scrape.go:1340\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).run\n\tgithub.com/prometheus/prometheus@v0.43.0/scrape/scrape.go:1264\nruntime.goexit\n\truntime/asm_amd64.s:1598"}
Abhi.G
  • 1,801
  • 5
  • 20
  • 35
  • 1
    I'm not sure I follow what you're trying to achieve here. You want to use ADOT to scrape Prometheus? Did you have a look at the [docs](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/application-metrics-prometheus.html) already, I don't know what `account_id.dkr.ecr.region.amazonaws.com/image-tag` is but for ADOT you should be using `public.ecr.aws/aws-observability/aws-otel-collector:v0.29.0`. – Michael Hausenblas May 25 '23 at 07:31
  • i want ADOT to scrape my spring actuator endpoint where my application metrics are available. But its failing with the error i have mentioned above. – Abhi.G May 25 '23 at 09:49
  • What Spring actuator endpoint? All you have in your task are Prometheus, and ADOT. There is no Spring application in your task definition. It's also weird that you don't have any port mappings defined in the task definition. – Mark B May 25 '23 at 11:30
  • I have updated my question a litlle bit. Basically "my-spring-app" is a spring boot app which is exposing an api "actuator/prometheus" on port "8080" which i want ADOT collector to scrape which is not happening. It is failing with the error i have posted. – Abhi.G May 25 '23 at 16:32

1 Answers1

1

I tried to replicate your setup but since you don't provide details about the container image you use, I had to come up with a similar setup that does what you want and works as one would expect.

Using the official ADOT image and the appropriate ECS configuration the following ECS task definition works (you will have to replace the xxxxxxxxxxx values with your own values such as account ID and AMP workspace ID):

{
    "taskDefinitionArn": "arn:aws:ecs:eu-west-1:xxxxxxxxxxx:task-definition/adot:1",
    "containerDefinitions": [
        {
            "name": "aws-otel-collector",
            "image": "public.ecr.aws/aws-observability/aws-otel-collector:v0.29.1",
            "essential": true,
            "command": [
                "--config=/etc/ecs/ecs-amp-prometheus.yaml"
            ],
            "environment": [
                {
                    "name": "AWS_PROMETHEUS_SCRAPING_ENDPOINT",
                    "value": "localhost:8765"
                },
                {
                    "name": "AWS_REGION",
                    "value": "eu-west-1"
                },
                {
                    "name": "AWS_PROMETHEUS_ENDPOINT",
                    "value": "https://aps-workspaces.eu-west-1.amazonaws.com/workspaces/xxxxxxxxxxx/api/v1/remote_write"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-create-group": "true",
                    "awslogs-group": "/ecs/ecs-aws-otel-sidecar-collector",
                    "awslogs-region": "eu-west-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        },
        {
            "name": "load-gen",
            "image": "public.ecr.aws/h0h9t7p1/alpine-bash-curl-jq:latest",
            "portMappings": [
                {
                    "name": "load-gen-80-tcp",
                    "containerPort": 80,
                    "hostPort": 80,
                    "protocol": "tcp",
                    "appProtocol": "http"
                }
            ],
            "essential": true,
            "command": [
                "/bin/bash",
                "-c",
                "sleep 15; while : ; do curl -s -o /dev/null localhost:8765 ; sleep 1; done"
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-create-group": "true",
                    "awslogs-group": "/ecs/adot",
                    "awslogs-region": "eu-west-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        },
        {
            "name": "metrics-source",
            "image": "public.ecr.aws/mhausenblas/ho11y:stable",
            "cpu": 0,
            "portMappings": [
                {
                    "name": "metrics-source-8765-tcp",
                    "containerPort": 8765,
                    "hostPort": 8765,
                    "protocol": "tcp",
                    "appProtocol": "http"
                }
            ],
            "essential": true,
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-create-group": "true",
                    "awslogs-group": "/ecs/adot",
                    "awslogs-region": "eu-west-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ],
    "family": "adot",
    "taskRoleArn": "arn:aws:iam::xxxxxxxxxxx:role/ecsTaskExecutionRole",
    "executionRoleArn": "arn:aws:iam::xxxxxxxxxxx:role/ecsTaskExecutionRole",
    "networkMode": "awsvpc",
    "revision": 1,
    "requiresAttributes": [
        {
            "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
        },
        {
            "name": "ecs.capability.execution-role-awslogs"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
        },
        {
            "name": "com.amazonaws.ecs.capability.task-iam-role"
        },
        {
            "name": "ecs.capability.extensible-ephemeral-storage"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
        },
        {
            "name": "ecs.capability.task-eni"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.29"
        }
    ],
    "placementConstraints": [],
    "compatibilities": [
        "EC2",
        "FARGATE"
    ],
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "cpu": "2048",
    "memory": "4096",
    "ephemeralStorage": {
        "sizeInGiB": 21
    },
    "runtimePlatform": {
        "cpuArchitecture": "X86_64",
        "operatingSystemFamily": "LINUX"
    }
}

When you then use the AMP workspace as a data source in AMG you see the result (here shown in Explore):

Prometheus metrics from ECS task ingested into AMP, visualized in AMG

Michael Hausenblas
  • 13,162
  • 4
  • 52
  • 66
  • Can you please answer for https://stackoverflow.com/questions/76627553/communication-failure-between-spring-boot-app-and-aws-distro-for-opentelemetry? – Harsh Kanakhara Jul 06 '23 at 09:52