0

Trying to get Ops Agent working and used the following command to install it:

curl -sSO https://dl.google.com/cloudagents/add-google-cloud-ops-agent-repo.sh
sudo bash add-google-cloud-ops-agent-repo.sh --also-install

I see the following error in the logs of journalctl -u google-cloud-ops-agent-opentelemetry-collector -xn

otelopscol[2706]: 2022-02-06T21:50:36.140Z        info        exporterhelper/queued_retry.go:215        Exporting failed. Will retry the request after interval.        {"kind": "exporter", "name": "googlecloud", "error": "[rpc error: code = Unauthenticated desc = transport: per-RPC creds failed due to error: metadata: GCE metadata \"instance/service-accounts/default/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring.read%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring.write\" not defined; rpc error: code = Unauthenticated desc = transport: per-RPC creds failed due to error: metadata: GCE metadata \"instance/service-accounts/default/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring.read%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring.write\" not defined; rpc error: code = Unauthenticated desc = transport: per-RPC creds failed due to error: metadata: GCE metadata \"instance/service-accounts/default/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring.read%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring.write\" not defined]", "interval": "14.115202828s"}

The services otherwise look good and are running but the UI reports that Ops Agent wasn't actually running which I suspect is due to no data being sent back.

Here is the status of running agents:

google-cloud-ops-agent-opentelemetry-collector.service - Google Cloud Ops Agent - Metrics Agent
     Loaded: loaded (/lib/systemd/system/google-cloud-ops-agent-opentelemetry-collector.service; static; vendor preset: enabled)
     Active: active (running) since Sat 2022-02-05 04:38:41 UTC; 1 day 17h ago
    Process: 2690 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -service=otel -in /etc/google-clou>
   Main PID: 2706 (otelopscol)
      Tasks: 9 (limit: 2369)
     Memory: 193.6M
     CGroup: /system.slice/google-cloud-ops-agent-opentelemetry-collector.service
             └─2706 /opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --config=/run/google-cloud-ops-agent-o>

Feb 06 21:55:53 mongo-1 otelopscol[2706]:         /root/go/pkg/mod/go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/qu>
Feb 06 21:55:53 mongo-1 otelopscol[2706]: go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).s>
Feb 06 21:55:53 mongo-1 otelopscol[2706]:         /root/go/pkg/mod/go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/me>
Feb 06 21:55:53 mongo-1 otelopscol[2706]: go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
Feb 06 21:55:53 mongo-1 otelopscol[2706]:         /root/go/pkg/mod/go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/qu>
Feb 06 21:55:53 mongo-1 otelopscol[2706]: go.opentelemetry.io/collector/exporter/exporterhelper/internal.consumerFunc.consume
Feb 06 21:55:53 mongo-1 otelopscol[2706]:         /root/go/pkg/mod/go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/in>
Feb 06 21:55:53 mongo-1 otelopscol[2706]: go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).Star>
Feb 06 21:55:53 mongo-1 otelopscol[2706]:         /root/go/pkg/mod/go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/in>
Feb 06 21:55:53 mongo-1 otelopscol[2706]: 2022-02-06T21:55:53.145Z        info        exporterhelper/queued_retry.go:215        Exp>

● google-cloud-ops-agent.service - Google Cloud Ops Agent
     Loaded: loaded (/lib/systemd/system/google-cloud-ops-agent.service; enabled; vendor preset: enabled)
     Active: active (exited) since Sat 2022-02-05 04:38:41 UTC; 1 day 17h ago
    Process: 2691 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -in /etc/google-cloud-ops-agent/co>
    Process: 2704 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
   Main PID: 2704 (code=exited, status=0/SUCCESS)

Feb 05 04:38:41 mongo-1 systemd[1]: Starting Google Cloud Ops Agent...
Feb 05 04:38:41 mongo-1 systemd[1]: Finished Google Cloud Ops Agent.
● google-cloud-ops-agent-fluent-bit.service - Google Cloud Ops Agent - Logging Agent
     Loaded: loaded (/lib/systemd/system/google-cloud-ops-agent-fluent-bit.service; static; vendor preset: enabled)
     Active: active (running) since Sun 2022-02-06 15:05:35 UTC; 6h ago
    Process: 22138 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -service=fluentbit -in /etc/googl>
   Main PID: 22144 (fluent-bit)
      Tasks: 22 (limit: 2369)
     Memory: 29.0M
     CGroup: /system.slice/google-cloud-ops-agent-fluent-bit.service
             └─22144 /opt/google-cloud-ops-agent/subagents/fluent-bit/bin/fluent-bit --config /run/google-cloud-ops-agent-fluent-bi>

Feb 06 15:05:35 mongo-1 systemd[1]: google-cloud-ops-agent-fluent-bit.service: Scheduled restart job, restart counter is at 7.
Feb 06 15:05:35 mongo-1 systemd[1]: Stopped Google Cloud Ops Agent - Logging Agent.
Feb 06 15:05:35 mongo-1 systemd[1]: Starting Google Cloud Ops Agent - Logging Agent...
Feb 06 15:05:35 mongo-1 systemd[1]: Started Google Cloud Ops Agent - Logging Agent.
Feb 06 15:05:35 mongo-1 fluent-bit[22144]: Fluent Bit v1.8.12
Feb 06 15:05:35 mongo-1 fluent-bit[22144]: * Copyright (C) 2019-2021 The Fluent Bit Authors
Feb 06 15:05:35 mongo-1 fluent-bit[22144]: * Copyright (C) 2015-2018 Treasure Data
Feb 06 15:05:35 mongo-1 fluent-bit[22144]: * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
Feb 06 15:05:35 mongo-1 fluent-bit[22144]: * https://fluentbit.io
Sam Stoelinga
  • 4,881
  • 7
  • 39
  • 54

1 Answers1

1

The issue was due to the VM not having a service account. The solution was to do the following:

  1. create a service account
  2. assign the service account Logs Writer and Monitoring Metric Writer roles
  3. Stop the VM, Edit the VM, set the newly created service account, start the VM

Note that by default a VM has a default service account. In my case I created the VM and explicitely didn't enable any service account hence the issue.

Sam Stoelinga
  • 4,881
  • 7
  • 39
  • 54
  • 1
    Your answer is incomplete. Compute Engine instances are assigned a service account by default and do not require creating a service account or assigning one. Add details explaining why your VM did not already have a default service account assigned. – John Hanley Feb 08 '22 at 06:46
  • 1
    I originally didn't require a service account or authentication to other services. So that's why I decided to create a VM without a service account. Then a year later or so (which was last week) I wanted to install Ops Agent and used the installation steps. The ops agent install script should have detected there was no SA associated with the VM and report this back. – Sam Stoelinga Feb 08 '22 at 08:18
  • A service account is not presented to a virtual machine. The software can only request tokens from the metadata server and cannot directly detect your problem with a missing service account. I agree that a better method of reporting this issue is good. Consider creating a feature request at https://developers.google.com/issue-tracker – John Hanley Feb 08 '22 at 16:38
  • I will file a FR. I was aware the default is to have a GCE SA when you create a VM but others might not be so good call out on that! – Sam Stoelinga Feb 09 '22 at 18:24