0

I am trying to run spring boot app in ECS with ADOT as side-car.

In ECS task definition, I am creating task for spring boot app. I am enabling Use Metrics Collection and selecting Amazon Managed Prometheus (OpenTelemetry Instrumentation) which is creating aws-otle-sidecar-collector. For networking mode, I am using awsvpc

ECS Task Definition

{
    "taskDefinitionArn": "arn:aws:ecs:us-east-1:334998782985:task-definition/ecs-monitoring-demo-td:20",
    "containerDefinitions": [
        {
            "name": "ecs-demo",
            "image": "334998782985.dkr.ecr.us-east-1.amazonaws.com/ecs-demo:cpu18",
            "cpu": 1024,
            "portMappings": [
                {
                    "name": "ecs-demo-8080-tcp",
                    "containerPort": 8080,
                    "hostPort": 8080,
                    "protocol": "tcp",
                    "appProtocol": "http"
                }
            ],
            "essential": true,
            "environment": [
                {
                    "name": "VM_OPTS",
                    "value": "-Dotel.exporter.otlp.protocol=http/protobuf"
                },
                {
                    "name": "SPRING_PROFILE",
                    "value": "custom"
                }
            ],
            "environmentFiles": [],
            "mountPoints": [],
            "volumesFrom": [],
            "ulimits": [],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-create-group": "true",
                    "awslogs-group": "/ecs/ecs-monitoring-demo-td",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        },
        {
            "name": "aws-otel-collector",
            "image": "public.ecr.aws/aws-observability/aws-otel-collector:v0.30.0",
            "cpu": 512,
            "portMappings": [
                {
                    "containerPort": 2000,
                    "hostPort": 2000,
                    "protocol": "udp"
                },
                {
                    "name": "aws-otel-collector-4317-tcp",
                    "containerPort": 4317,
                    "hostPort": 4317,
                    "protocol": "tcp"
                },
                {
                    "containerPort": 8125,
                    "hostPort": 8125,
                    "protocol": "udp"
                }
            ],
            "essential": true,
            "command": [
                "--config=/etc/ecs/ecs-amp.yaml"
            ],
            "environment": [
                {
                    "name": "AWS_PROMETHEUS_ENDPOINT",
                    "value": "https://aps-workspaces.us-east-1.amazonaws.com/workspaces/ws-6b99b650-420b-4239-aa5d-fd67b8ea6fbe/api/v1/remote_write"
                }
            ],
            "mountPoints": [],
            "volumesFrom": [],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-create-group": "true",
                    "awslogs-group": "/ecs/ecs-aws-otel-sidecar-collector",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ],
    "family": "ecs-monitoring-demo-td",
    "executionRoleArn": "arn:aws:iam::334998782985:role/ecsTaskExecutionRole",
    "networkMode": "awsvpc",
    "revision": 20,
    "volumes": [],
    "status": "ACTIVE",
    "requiresAttributes": [
        {
            "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
        },
        {
            "name": "ecs.capability.execution-role-awslogs"
        },
        {
            "name": "com.amazonaws.ecs.capability.ecr-auth"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
        },
        {
            "name": "ecs.capability.execution-role-ecr-pull"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
        },
        {
            "name": "ecs.capability.task-eni"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.29"
        }
    ],
    "placementConstraints": [],
    "compatibilities": [
        "EC2"
    ],
    "requiresCompatibilities": [
        "EC2"
    ],
    "cpu": "1536",
    "memory": "3072",
    "runtimePlatform": {
        "cpuArchitecture": "X86_64",
        "operatingSystemFamily": "LINUX"
    },
    "registeredAt": "2023-07-06T09:18:08.682Z",
    "registeredBy": "arn:aws:iam::334998782985:root",
    "tags": []
}

I am following https://www.baeldung.com/spring-boot-opentelemetry-setup to build communication between spring boot app and OTEL Collector. Configuration in my spring boot app are as below

Dependencies in pom.xml

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-test</artifactId>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-starter-sleuth</artifactId>
        <exclusions>
            <exclusion>
                <groupId>org.springframework.cloud</groupId>
                <artifactId>spring-cloud-sleuth-brave</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-sleuth-otel-autoconfigure</artifactId>
    </dependency>
    <dependency>
        <groupId>io.opentelemetry</groupId>
        <artifactId>opentelemetry-exporter-otlp</artifactId>
    </dependency>
</dependencies>
<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-dependencies</artifactId>
            <version>${spring-cloud.version}</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-sleuth-otel-dependencies</artifactId>
            <version>1.1.3</version>
            <scope>import</scope>
            <type>pom</type>
        </dependency>
    </dependencies>
</dependencyManagement>

In application.properties

server.port = 8080
spring.application.name=ECS-Monitoring
logging.level.io.opentelemetry=debug
spring.sleuth.otel.config.trace-id-ratio-based=1.0
spring.sleuth.otel.exporter.otlp.endpoint=http://localhost:4318

In /etc/ecs/aws-amp.yaml (Created inside ecs-aws-otel-sidecar-collector docker container by AWS)

extensions:
  health_check:
  sigv4auth:
    region: $AWS_REGION

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  awsecscontainermetrics:

processors:
  batch/metrics:
    timeout: 60s
  resourcedetection:
    detectors:
      - env
      - system
      - ecs
      - ec2
  filter:
    metrics:
      include:
        match_type: strict
        metric_names:
          - ecs.task.memory.reserved
          - ecs.task.memory.utilized
          - ecs.task.cpu.reserved
          - ecs.task.cpu.utilized
          - ecs.task.network.rate.rx
          - ecs.task.network.rate.tx
          - ecs.task.storage.read_bytes
          - ecs.task.storage.write_bytes
          - container.duration

exporters:
  prometheusremotewrite:
    endpoint: $AWS_PROMETHEUS_ENDPOINT
    auth:
      authenticator: sigv4auth
    resource_to_telemetry_conversion:
      enabled: true

service:
  pipelines:
    metrics/application:
      receivers: [otlp]
      processors: [resourcedetection, batch/metrics]
      exporters: [prometheusremotewrite]
    metrics:
      receivers: [awsecscontainermetrics]
      processors: [filter]
      exporters: [prometheusremotewrite]

  extensions: [health_check, sigv4auth]

Both my docker container are running inside same EC2 instance. Below are the logs

CloudWatch Logs for spring boot app

2023-07-06 09:19:21.918  INFO [Custom-App,,] 1 --- [           main] com.ecs.demo.ecsdemo.EcsDemoApplication  : Starting EcsDemoApplication v0.0.1-SNAPSHOT using Java 11.0.19 on ip-172-31-6-36.ec2.internal with PID 1 (/app/opx/app.jar started by root in /app/opx)
2023-07-06 09:19:21.934  INFO [Custom-App,,] 1 --- [           main] com.ecs.demo.ecsdemo.EcsDemoApplication  : The following 1 profile is active: "custom"
2023-07-06 09:19:24.518  INFO [Custom-App,,] 1 --- [           main] o.s.cloud.context.scope.GenericScope     : BeanFactory id=a51cc911-915e-33cf-8578-687892020128
2023-07-06 09:19:26.125  INFO [Custom-App,,] 1 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port(s): 8080 (http)
2023-07-06 09:19:26.144  INFO [Custom-App,,] 1 --- [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2023-07-06 09:19:26.145  INFO [Custom-App,,] 1 --- [           main] org.apache.catalina.core.StandardEngine  : Starting Servlet engine: [Apache Tomcat/9.0.75]
2023-07-06 09:19:26.382  INFO [Custom-App,,] 1 --- [           main] o.a.c.c.C.[Tomcat].[localhost].[/env]    : Initializing Spring embedded WebApplicationContext
2023-07-06 09:19:26.382  INFO [Custom-App,,] 1 --- [           main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 4201 ms
2023-07-06 09:19:27.642 DEBUG [Custom-App,,] 1 --- [           main] i.o.sdk.internal.JavaVersionSpecific     : Using the APIs optimized for: Java 9+
2023-07-06 09:19:27.771 DEBUG [Custom-App,,] 1 --- [           main] .i.a.i.EmbeddedInstrumentationProperties : Did not find embedded instrumentation properties file META-INF/io/opentelemetry/instrumentation/org.springframework.cloud.sleuth.properties
2023-07-06 09:19:29.477 DEBUG [Custom-App,,] 1 --- [           main] .i.a.i.EmbeddedInstrumentationProperties : Did not find embedded instrumentation properties file META-INF/io/opentelemetry/instrumentation/io.micrometer.tracing.properties
2023-07-06 09:19:29.776  INFO [Custom-App,,] 1 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8080 (http) with context path '/env'
2023-07-06 09:19:29.822  INFO [Custom-App,,] 1 --- [           main] com.ecs.demo.ecsdemo.EcsDemoApplication  : Started EcsDemoApplication in 9.871 seconds (JVM running for 11.198)
2023-07-06 09:19:34.533 DEBUG [Custom-App,,] 1 --- [nio-8080-exec-1] t.p.B3PropagatorExtractorMultipleHeaders : Invalid TraceId in B3 header: null'. Returning INVALID span context.
2023-07-06 09:19:34.535 DEBUG [Custom-App,,] 1 --- [nio-8080-exec-2] t.p.B3PropagatorExtractorMultipleHeaders : Invalid TraceId in B3 header: null'. Returning INVALID span context.
2023-07-06 09:19:34.580  INFO [Custom-App,befdbd9fc025dc9c146dbab3e090c576,1989f539f67fc492] 1 --- [nio-8080-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/env]    : Initializing Spring DispatcherServlet 'dispatcherServlet'
2023-07-06 09:19:34.580  INFO [Custom-App,befdbd9fc025dc9c146dbab3e090c576,1989f539f67fc492] 1 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet        : Initializing Servlet 'dispatcherServlet'
2023-07-06 09:19:34.582  INFO [Custom-App,befdbd9fc025dc9c146dbab3e090c576,1989f539f67fc492] 1 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet        : Completed initialization in 1 ms
2023-07-06 09:19:34.719  INFO [Custom-App,f573665663bafab697a35f1290e0d0bc,8e608e509ca0edbc] 1 --- [nio-8080-exec-2] c.e.d.ecsdemo.RestControllerEndpoints    : Hello World
2023-07-06 09:19:34.719  INFO [Custom-App,befdbd9fc025dc9c146dbab3e090c576,8f7a8fb281c42a82] 1 --- [nio-8080-exec-1] c.e.d.ecsdemo.RestControllerEndpoints    : Hello World
2023-07-06 09:19:35.023 ERROR [Custom-App,,] 1 --- [alhost:4318/...] i.o.e.internal.grpc.OkHttpGrpcExporter   : Failed to export spans. The request could not be executed. Full error message: Broken pipe (Write failed)
2023-07-06 09:19:35.024 DEBUG [Custom-App,,] 1 --- [alhost:4318/...] i.o.s.trace.export.SimpleSpanProcessor   : Exporter failed
2023-07-06 09:19:35.034 ERROR [Custom-App,,] 1 --- [alhost:4318/...] i.o.e.internal.grpc.OkHttpGrpcExporter   : Failed to export spans. The request could not be executed. Full error message: null
2023-07-06 09:19:35.034 DEBUG [Custom-App,,] 1 --- [alhost:4318/...] i.o.s.trace.export.SimpleSpanProcessor   : Exporter failed
2023-07-06 09:19:35.047 ERROR [Custom-App,,] 1 --- [alhost:4318/...] i.o.e.internal.grpc.OkHttpGrpcExporter   : Failed to export spans. The request could not be executed. Full error message: FRAME_SIZE_ERROR: 4740180
2023-07-06 09:19:35.048 DEBUG [Custom-App,,] 1 --- [alhost:4318/...] i.o.s.trace.export.SimpleSpanProcessor   : Exporter failed
2023-07-06 09:19:35.046 ERROR [Custom-App,,] 1 --- [alhost:4318/...] i.o.e.internal.grpc.OkHttpGrpcExporter   : Failed to export spans. The request could not be executed. Full error message: FRAME_SIZE_ERROR: 4740180
2023-07-06 09:19:35.048 DEBUG [Custom-App,,] 1 --- [alhost:4318/...] i.o.s.trace.export.SimpleSpanProcessor   : Exporter failed

CloudWatch Logs for ecs-aws-otel-sidecar-collector

2023/07/05 16:11:16 ADOT Collector version: v0.30.0
2023/07/05 16:11:16 found no extra config, skip it, err: open /opt/aws/aws-otel-collector/etc/extracfg.txt: no such file or directory
2023/07/05 16:11:16 attn: users of the prometheus receiver, prometheus exporter or prometheusremotewrite exporter please refer to https://github.com/aws-observability/aws-otel-collector/issues/2043 in regards to an ADOT Collector v0.31.0 breaking change
2023-07-05T16:11:16.924Z    info    service/telemetry.go:104    Setting up own telemetry...
2023-07-05T16:11:16.925Z    info    service/telemetry.go:127    Serving Prometheus metrics  
{
    "address": ":8888",
    "level": "Basic"
}
2023-07-05T16:11:16.928Z    info    filterprocessor@v0.78.0/metrics.go:90   Metric filter configured    
{
    "kind": "processor",
    "name": "filter",
    "pipeline": "metrics",
    "include match_type": "strict",
    "include expressions": [],
    "include metric names": [
        "ecs.task.memory.reserved",
        "ecs.task.memory.utilized",
        "ecs.task.cpu.reserved",
        "ecs.task.cpu.utilized",
        "ecs.task.network.rate.rx",
        "ecs.task.network.rate.tx",
        "ecs.task.storage.read_bytes",
        "ecs.task.storage.write_bytes",
        "container.duration"
    ],
    "include metrics with resource attributes": null,
    "exclude match_type": "",
    "exclude expressions": [],
    "exclude metric names": [],
    "exclude metrics with resource attributes": null
}
2023-07-05T16:11:16.929Z    info    service/service.go:131  Starting aws-otel-collector...  
{
    "Version": "v0.30.0",
    "NumCPU": 2
}
2023-07-05T16:11:16.930Z    info    extensions/extensions.go:30 Starting extensions...
2023-07-05T16:11:16.930Z    info    extensions/extensions.go:33 Extension is starting...    
{
    "kind": "extension",
    "name": "health_check"
}
2023-07-05T16:11:16.930Z    info    healthcheckextension@v0.78.0/healthcheckextension.go:34 Starting health_check extension 
{
    "kind": "extension",
    "name": "health_check",
    "config": {
        "Endpoint": "0.0.0.0:13133",
        "TLSSetting": null,
        "CORS": null,
        "Auth": null,
        "MaxRequestBodySize": 0,
        "IncludeMetadata": false,
        "Path": "/",
        "ResponseBody": null,
        "CheckCollectorPipeline": {
            "Enabled": false,
            "Interval": "5m",
            "ExporterFailureThreshold": 5
        }
    }
}
2023-07-05T16:11:16.931Z    warn    internal/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    
{
    "kind": "extension",
    "name": "health_check",
    "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"
}
2023-07-05T16:11:16.931Z    info    extensions/extensions.go:37 Extension started.  
{
    "kind": "extension",
    "name": "health_check"
}
2023-07-05T16:11:16.931Z    info    extensions/extensions.go:33 Extension is starting...    
{
    "kind": "extension",
    "name": "sigv4auth"
}
2023-07-05T16:11:16.931Z    info    extensions/extensions.go:37 Extension started.  
{
    "kind": "extension",
    "name": "sigv4auth"
}
2023-07-05T16:11:16.934Z    info    internal/resourcedetection.go:125   began detecting resource information    
{
    "kind": "processor",
    "name": "resourcedetection",
    "pipeline": "metrics/application"
}
2023-07-05T16:11:16.952Z    info    internal/resourcedetection.go:139   detected resource information   
{
    "kind": "processor",
    "name": "resourcedetection",
    "pipeline": "metrics/application",
    "resource": {
        "aws.ecs.cluster.arn": "arn:aws:ecs:us-east-1:123031526709:cluster/ecs-monitoring-cluster",
        "aws.ecs.launchtype": "ec2",
        "aws.ecs.task.arn": "arn:aws:ecs:us-east-1:123031526709:task/ecs-monitoring-cluster/b8a1667bae08472bbdd482b4ebe35ee0",
        "aws.ecs.task.family": "ecs-monitoring-td",
        "aws.ecs.task.revision": "1",
        "aws.log.group.arns": [
            "arn:aws:logs:us-east-1:123031526709:log-group:/ecs/ecs-monitoring-td"
        ],
        "aws.log.group.names": [
            "/ecs/ecs-monitoring-td"
        ],
        "aws.log.stream.arns": [
            "arn:aws:logs:us-east-1:123031526709:log-group:/ecs/ecs-monitoring-td:log-stream:ecs/ecs-cms/b8a1667bae08472bbdd482b4ebe35ee0"
        ],
        "aws.log.stream.names": [
            "ecs/ecs-cms/b8a1667bae08472bbdd482b4ebe35ee0"
        ],
        "cloud.account.id": "123031526709",
        "cloud.availability_zone": "us-east-1b",
        "cloud.platform": "aws_ecs",
        "cloud.provider": "aws",
        "cloud.region": "us-east-1",
        "host.id": "2205ffdf-8508-42fc-8dcc-aa0afaef4718",
        "host.image.id": "ami-0e57cc6ff46c978c3",
        "host.name": "ip-172-31-123-52.ec2.internal",
        "host.type": "t3a.medium",
        "os.type": "linux"
    }
}
2023-07-05T16:11:16.953Z    warn    internal/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    
{
    "kind": "receiver",
    "name": "otlp",
    "data_type": "metrics",
    "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"
}
2023-07-05T16:11:16.960Z    info    otlpreceiver@v0.78.2/otlp.go:83 Starting GRPC server    
{
    "kind": "receiver",
    "name": "otlp",
    "data_type": "metrics",
    "endpoint": "0.0.0.0:4317"
}
2023-07-05T16:11:16.960Z    warn    internal/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    
{
    "kind": "receiver",
    "name": "otlp",
    "data_type": "metrics",
    "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"
}
2023-07-05T16:11:16.960Z    info    otlpreceiver@v0.78.2/otlp.go:101    Starting HTTP server    
{
    "kind": "receiver",
    "name": "otlp",
    "data_type": "metrics",
    "endpoint": "0.0.0.0:4318"
}
2023-07-05T16:11:16.960Z    info    healthcheck/handler.go:129  Health Check state change   
{
    "kind": "extension",
    "name": "health_check",
    "status": "ready"
}
2023-07-05T16:11:16.960Z    info    service/service.go:148  Everything is ready. Begin running and processing data.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2b23941]
goroutine 121 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsecscontainermetricsreceiver/internal/awsecscontainermetrics.getContainerMetrics(0xc0008ab300, 0xc0002b2800?)

What could be the reason for communication failure? Any help would be appreciated.

Harsh Kanakhara
  • 909
  • 4
  • 13
  • 38

0 Answers0