I am trying to run spring boot app in ECS with ADOT as side-car.
In ECS task definition, I am creating task for spring boot app. I am enabling Use Metrics Collection and selecting Amazon Managed Prometheus (OpenTelemetry Instrumentation) which is creating aws-otle-sidecar-collector. For networking mode, I am using awsvpc
ECS Task Definition
{
"taskDefinitionArn": "arn:aws:ecs:us-east-1:334998782985:task-definition/ecs-monitoring-demo-td:20",
"containerDefinitions": [
{
"name": "ecs-demo",
"image": "334998782985.dkr.ecr.us-east-1.amazonaws.com/ecs-demo:cpu18",
"cpu": 1024,
"portMappings": [
{
"name": "ecs-demo-8080-tcp",
"containerPort": 8080,
"hostPort": 8080,
"protocol": "tcp",
"appProtocol": "http"
}
],
"essential": true,
"environment": [
{
"name": "VM_OPTS",
"value": "-Dotel.exporter.otlp.protocol=http/protobuf"
},
{
"name": "SPRING_PROFILE",
"value": "custom"
}
],
"environmentFiles": [],
"mountPoints": [],
"volumesFrom": [],
"ulimits": [],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-create-group": "true",
"awslogs-group": "/ecs/ecs-monitoring-demo-td",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
},
{
"name": "aws-otel-collector",
"image": "public.ecr.aws/aws-observability/aws-otel-collector:v0.30.0",
"cpu": 512,
"portMappings": [
{
"containerPort": 2000,
"hostPort": 2000,
"protocol": "udp"
},
{
"name": "aws-otel-collector-4317-tcp",
"containerPort": 4317,
"hostPort": 4317,
"protocol": "tcp"
},
{
"containerPort": 8125,
"hostPort": 8125,
"protocol": "udp"
}
],
"essential": true,
"command": [
"--config=/etc/ecs/ecs-amp.yaml"
],
"environment": [
{
"name": "AWS_PROMETHEUS_ENDPOINT",
"value": "https://aps-workspaces.us-east-1.amazonaws.com/workspaces/ws-6b99b650-420b-4239-aa5d-fd67b8ea6fbe/api/v1/remote_write"
}
],
"mountPoints": [],
"volumesFrom": [],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-create-group": "true",
"awslogs-group": "/ecs/ecs-aws-otel-sidecar-collector",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}
],
"family": "ecs-monitoring-demo-td",
"executionRoleArn": "arn:aws:iam::334998782985:role/ecsTaskExecutionRole",
"networkMode": "awsvpc",
"revision": 20,
"volumes": [],
"status": "ACTIVE",
"requiresAttributes": [
{
"name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
},
{
"name": "ecs.capability.execution-role-awslogs"
},
{
"name": "com.amazonaws.ecs.capability.ecr-auth"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
},
{
"name": "ecs.capability.execution-role-ecr-pull"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
},
{
"name": "ecs.capability.task-eni"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.29"
}
],
"placementConstraints": [],
"compatibilities": [
"EC2"
],
"requiresCompatibilities": [
"EC2"
],
"cpu": "1536",
"memory": "3072",
"runtimePlatform": {
"cpuArchitecture": "X86_64",
"operatingSystemFamily": "LINUX"
},
"registeredAt": "2023-07-06T09:18:08.682Z",
"registeredBy": "arn:aws:iam::334998782985:root",
"tags": []
}
I am following https://www.baeldung.com/spring-boot-opentelemetry-setup to build communication between spring boot app and OTEL Collector. Configuration in my spring boot app are as below
Dependencies in pom.xml
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
<exclusions>
<exclusion>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-sleuth-brave</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-sleuth-otel-autoconfigure</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>${spring-cloud.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-sleuth-otel-dependencies</artifactId>
<version>1.1.3</version>
<scope>import</scope>
<type>pom</type>
</dependency>
</dependencies>
</dependencyManagement>
In application.properties
server.port = 8080
spring.application.name=ECS-Monitoring
logging.level.io.opentelemetry=debug
spring.sleuth.otel.config.trace-id-ratio-based=1.0
spring.sleuth.otel.exporter.otlp.endpoint=http://localhost:4318
In /etc/ecs/aws-amp.yaml
(Created inside ecs-aws-otel-sidecar-collector
docker container by AWS)
extensions:
health_check:
sigv4auth:
region: $AWS_REGION
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
awsecscontainermetrics:
processors:
batch/metrics:
timeout: 60s
resourcedetection:
detectors:
- env
- system
- ecs
- ec2
filter:
metrics:
include:
match_type: strict
metric_names:
- ecs.task.memory.reserved
- ecs.task.memory.utilized
- ecs.task.cpu.reserved
- ecs.task.cpu.utilized
- ecs.task.network.rate.rx
- ecs.task.network.rate.tx
- ecs.task.storage.read_bytes
- ecs.task.storage.write_bytes
- container.duration
exporters:
prometheusremotewrite:
endpoint: $AWS_PROMETHEUS_ENDPOINT
auth:
authenticator: sigv4auth
resource_to_telemetry_conversion:
enabled: true
service:
pipelines:
metrics/application:
receivers: [otlp]
processors: [resourcedetection, batch/metrics]
exporters: [prometheusremotewrite]
metrics:
receivers: [awsecscontainermetrics]
processors: [filter]
exporters: [prometheusremotewrite]
extensions: [health_check, sigv4auth]
Both my docker container are running inside same EC2 instance. Below are the logs
CloudWatch Logs for spring boot app
2023-07-06 09:19:21.918 INFO [Custom-App,,] 1 --- [ main] com.ecs.demo.ecsdemo.EcsDemoApplication : Starting EcsDemoApplication v0.0.1-SNAPSHOT using Java 11.0.19 on ip-172-31-6-36.ec2.internal with PID 1 (/app/opx/app.jar started by root in /app/opx)
2023-07-06 09:19:21.934 INFO [Custom-App,,] 1 --- [ main] com.ecs.demo.ecsdemo.EcsDemoApplication : The following 1 profile is active: "custom"
2023-07-06 09:19:24.518 INFO [Custom-App,,] 1 --- [ main] o.s.cloud.context.scope.GenericScope : BeanFactory id=a51cc911-915e-33cf-8578-687892020128
2023-07-06 09:19:26.125 INFO [Custom-App,,] 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 8080 (http)
2023-07-06 09:19:26.144 INFO [Custom-App,,] 1 --- [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
2023-07-06 09:19:26.145 INFO [Custom-App,,] 1 --- [ main] org.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/9.0.75]
2023-07-06 09:19:26.382 INFO [Custom-App,,] 1 --- [ main] o.a.c.c.C.[Tomcat].[localhost].[/env] : Initializing Spring embedded WebApplicationContext
2023-07-06 09:19:26.382 INFO [Custom-App,,] 1 --- [ main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 4201 ms
2023-07-06 09:19:27.642 DEBUG [Custom-App,,] 1 --- [ main] i.o.sdk.internal.JavaVersionSpecific : Using the APIs optimized for: Java 9+
2023-07-06 09:19:27.771 DEBUG [Custom-App,,] 1 --- [ main] .i.a.i.EmbeddedInstrumentationProperties : Did not find embedded instrumentation properties file META-INF/io/opentelemetry/instrumentation/org.springframework.cloud.sleuth.properties
2023-07-06 09:19:29.477 DEBUG [Custom-App,,] 1 --- [ main] .i.a.i.EmbeddedInstrumentationProperties : Did not find embedded instrumentation properties file META-INF/io/opentelemetry/instrumentation/io.micrometer.tracing.properties
2023-07-06 09:19:29.776 INFO [Custom-App,,] 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path '/env'
2023-07-06 09:19:29.822 INFO [Custom-App,,] 1 --- [ main] com.ecs.demo.ecsdemo.EcsDemoApplication : Started EcsDemoApplication in 9.871 seconds (JVM running for 11.198)
2023-07-06 09:19:34.533 DEBUG [Custom-App,,] 1 --- [nio-8080-exec-1] t.p.B3PropagatorExtractorMultipleHeaders : Invalid TraceId in B3 header: null'. Returning INVALID span context.
2023-07-06 09:19:34.535 DEBUG [Custom-App,,] 1 --- [nio-8080-exec-2] t.p.B3PropagatorExtractorMultipleHeaders : Invalid TraceId in B3 header: null'. Returning INVALID span context.
2023-07-06 09:19:34.580 INFO [Custom-App,befdbd9fc025dc9c146dbab3e090c576,1989f539f67fc492] 1 --- [nio-8080-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/env] : Initializing Spring DispatcherServlet 'dispatcherServlet'
2023-07-06 09:19:34.580 INFO [Custom-App,befdbd9fc025dc9c146dbab3e090c576,1989f539f67fc492] 1 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet : Initializing Servlet 'dispatcherServlet'
2023-07-06 09:19:34.582 INFO [Custom-App,befdbd9fc025dc9c146dbab3e090c576,1989f539f67fc492] 1 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet : Completed initialization in 1 ms
2023-07-06 09:19:34.719 INFO [Custom-App,f573665663bafab697a35f1290e0d0bc,8e608e509ca0edbc] 1 --- [nio-8080-exec-2] c.e.d.ecsdemo.RestControllerEndpoints : Hello World
2023-07-06 09:19:34.719 INFO [Custom-App,befdbd9fc025dc9c146dbab3e090c576,8f7a8fb281c42a82] 1 --- [nio-8080-exec-1] c.e.d.ecsdemo.RestControllerEndpoints : Hello World
2023-07-06 09:19:35.023 ERROR [Custom-App,,] 1 --- [alhost:4318/...] i.o.e.internal.grpc.OkHttpGrpcExporter : Failed to export spans. The request could not be executed. Full error message: Broken pipe (Write failed)
2023-07-06 09:19:35.024 DEBUG [Custom-App,,] 1 --- [alhost:4318/...] i.o.s.trace.export.SimpleSpanProcessor : Exporter failed
2023-07-06 09:19:35.034 ERROR [Custom-App,,] 1 --- [alhost:4318/...] i.o.e.internal.grpc.OkHttpGrpcExporter : Failed to export spans. The request could not be executed. Full error message: null
2023-07-06 09:19:35.034 DEBUG [Custom-App,,] 1 --- [alhost:4318/...] i.o.s.trace.export.SimpleSpanProcessor : Exporter failed
2023-07-06 09:19:35.047 ERROR [Custom-App,,] 1 --- [alhost:4318/...] i.o.e.internal.grpc.OkHttpGrpcExporter : Failed to export spans. The request could not be executed. Full error message: FRAME_SIZE_ERROR: 4740180
2023-07-06 09:19:35.048 DEBUG [Custom-App,,] 1 --- [alhost:4318/...] i.o.s.trace.export.SimpleSpanProcessor : Exporter failed
2023-07-06 09:19:35.046 ERROR [Custom-App,,] 1 --- [alhost:4318/...] i.o.e.internal.grpc.OkHttpGrpcExporter : Failed to export spans. The request could not be executed. Full error message: FRAME_SIZE_ERROR: 4740180
2023-07-06 09:19:35.048 DEBUG [Custom-App,,] 1 --- [alhost:4318/...] i.o.s.trace.export.SimpleSpanProcessor : Exporter failed
CloudWatch Logs for ecs-aws-otel-sidecar-collector
2023/07/05 16:11:16 ADOT Collector version: v0.30.0
2023/07/05 16:11:16 found no extra config, skip it, err: open /opt/aws/aws-otel-collector/etc/extracfg.txt: no such file or directory
2023/07/05 16:11:16 attn: users of the prometheus receiver, prometheus exporter or prometheusremotewrite exporter please refer to https://github.com/aws-observability/aws-otel-collector/issues/2043 in regards to an ADOT Collector v0.31.0 breaking change
2023-07-05T16:11:16.924Z info service/telemetry.go:104 Setting up own telemetry...
2023-07-05T16:11:16.925Z info service/telemetry.go:127 Serving Prometheus metrics
{
"address": ":8888",
"level": "Basic"
}
2023-07-05T16:11:16.928Z info filterprocessor@v0.78.0/metrics.go:90 Metric filter configured
{
"kind": "processor",
"name": "filter",
"pipeline": "metrics",
"include match_type": "strict",
"include expressions": [],
"include metric names": [
"ecs.task.memory.reserved",
"ecs.task.memory.utilized",
"ecs.task.cpu.reserved",
"ecs.task.cpu.utilized",
"ecs.task.network.rate.rx",
"ecs.task.network.rate.tx",
"ecs.task.storage.read_bytes",
"ecs.task.storage.write_bytes",
"container.duration"
],
"include metrics with resource attributes": null,
"exclude match_type": "",
"exclude expressions": [],
"exclude metric names": [],
"exclude metrics with resource attributes": null
}
2023-07-05T16:11:16.929Z info service/service.go:131 Starting aws-otel-collector...
{
"Version": "v0.30.0",
"NumCPU": 2
}
2023-07-05T16:11:16.930Z info extensions/extensions.go:30 Starting extensions...
2023-07-05T16:11:16.930Z info extensions/extensions.go:33 Extension is starting...
{
"kind": "extension",
"name": "health_check"
}
2023-07-05T16:11:16.930Z info healthcheckextension@v0.78.0/healthcheckextension.go:34 Starting health_check extension
{
"kind": "extension",
"name": "health_check",
"config": {
"Endpoint": "0.0.0.0:13133",
"TLSSetting": null,
"CORS": null,
"Auth": null,
"MaxRequestBodySize": 0,
"IncludeMetadata": false,
"Path": "/",
"ResponseBody": null,
"CheckCollectorPipeline": {
"Enabled": false,
"Interval": "5m",
"ExporterFailureThreshold": 5
}
}
}
2023-07-05T16:11:16.931Z warn internal/warning.go:40 Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks
{
"kind": "extension",
"name": "health_check",
"documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"
}
2023-07-05T16:11:16.931Z info extensions/extensions.go:37 Extension started.
{
"kind": "extension",
"name": "health_check"
}
2023-07-05T16:11:16.931Z info extensions/extensions.go:33 Extension is starting...
{
"kind": "extension",
"name": "sigv4auth"
}
2023-07-05T16:11:16.931Z info extensions/extensions.go:37 Extension started.
{
"kind": "extension",
"name": "sigv4auth"
}
2023-07-05T16:11:16.934Z info internal/resourcedetection.go:125 began detecting resource information
{
"kind": "processor",
"name": "resourcedetection",
"pipeline": "metrics/application"
}
2023-07-05T16:11:16.952Z info internal/resourcedetection.go:139 detected resource information
{
"kind": "processor",
"name": "resourcedetection",
"pipeline": "metrics/application",
"resource": {
"aws.ecs.cluster.arn": "arn:aws:ecs:us-east-1:123031526709:cluster/ecs-monitoring-cluster",
"aws.ecs.launchtype": "ec2",
"aws.ecs.task.arn": "arn:aws:ecs:us-east-1:123031526709:task/ecs-monitoring-cluster/b8a1667bae08472bbdd482b4ebe35ee0",
"aws.ecs.task.family": "ecs-monitoring-td",
"aws.ecs.task.revision": "1",
"aws.log.group.arns": [
"arn:aws:logs:us-east-1:123031526709:log-group:/ecs/ecs-monitoring-td"
],
"aws.log.group.names": [
"/ecs/ecs-monitoring-td"
],
"aws.log.stream.arns": [
"arn:aws:logs:us-east-1:123031526709:log-group:/ecs/ecs-monitoring-td:log-stream:ecs/ecs-cms/b8a1667bae08472bbdd482b4ebe35ee0"
],
"aws.log.stream.names": [
"ecs/ecs-cms/b8a1667bae08472bbdd482b4ebe35ee0"
],
"cloud.account.id": "123031526709",
"cloud.availability_zone": "us-east-1b",
"cloud.platform": "aws_ecs",
"cloud.provider": "aws",
"cloud.region": "us-east-1",
"host.id": "2205ffdf-8508-42fc-8dcc-aa0afaef4718",
"host.image.id": "ami-0e57cc6ff46c978c3",
"host.name": "ip-172-31-123-52.ec2.internal",
"host.type": "t3a.medium",
"os.type": "linux"
}
}
2023-07-05T16:11:16.953Z warn internal/warning.go:40 Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks
{
"kind": "receiver",
"name": "otlp",
"data_type": "metrics",
"documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"
}
2023-07-05T16:11:16.960Z info otlpreceiver@v0.78.2/otlp.go:83 Starting GRPC server
{
"kind": "receiver",
"name": "otlp",
"data_type": "metrics",
"endpoint": "0.0.0.0:4317"
}
2023-07-05T16:11:16.960Z warn internal/warning.go:40 Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks
{
"kind": "receiver",
"name": "otlp",
"data_type": "metrics",
"documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"
}
2023-07-05T16:11:16.960Z info otlpreceiver@v0.78.2/otlp.go:101 Starting HTTP server
{
"kind": "receiver",
"name": "otlp",
"data_type": "metrics",
"endpoint": "0.0.0.0:4318"
}
2023-07-05T16:11:16.960Z info healthcheck/handler.go:129 Health Check state change
{
"kind": "extension",
"name": "health_check",
"status": "ready"
}
2023-07-05T16:11:16.960Z info service/service.go:148 Everything is ready. Begin running and processing data.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2b23941]
goroutine 121 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsecscontainermetricsreceiver/internal/awsecscontainermetrics.getContainerMetrics(0xc0008ab300, 0xc0002b2800?)
What could be the reason for communication failure? Any help would be appreciated.