11

I followed this instruction https://cloud.google.com/monitoring/agent/install-agent#linux-install

$ curl -O "https://repo.stackdriver.com/stack-install.sh"
$ sudo bash stack-install.sh --write-gcm
Unidentifiable or unsupported platform.

The content of /etc/os-release.

$ cat /etc/os-release
BUILD_ID=8820.0.0
NAME="Container-VM Image"
GOOGLE_CRASH_ID=Lakitu
VERSION_ID=55
BUG_REPORT_URL=https://crbug.com/new
PRETTY_NAME="Google Container-VM Image"
VERSION=55
GOOGLE_METRICS_PRODUCT_ID=26
HOME_URL="https://cloud.google.com/compute/docs/containers/vm-image/"
ID=gci

https://cloud.google.com/compute/docs/containers/vm-image/faq#what_is_the_software_package_manager_for_container-vm_image

In order to update a particular package, the entire OS image needs to be updated

So, it seems that we must wait till update for a stackdriver agent installed version of image or give it up.

Also this vm image is not my choice. Newly created GKE nodes use Container-VM images by default. So for now I'll try to create nodes via gcloud container node-pools create --image-type

approxiblue
  • 6,982
  • 16
  • 51
  • 59
hiroshi
  • 6,871
  • 3
  • 46
  • 59
  • On GCE default images, the nodes already have stackdriver installed. Or at least they have the fluentd logger which forwards things to google/stackdriver – Matej Oct 07 '16 at 16:42
  • 1
    Really? Since stack driver is still in beta, they didn't pre-installed agents. I asked once at cloud service support. If agents are not installed we cannot monitor memory usages. – hiroshi Oct 08 '16 at 09:17
  • 1
    I see, maybe not for Memory.. But I see logs from kubernetes apps in stackdriver logs w/o needing to do anything. Are you using GKE? – Matej Oct 08 '16 at 11:27
  • Yes, I use GKE. As you said, I can see logs. – hiroshi Oct 09 '16 at 04:19
  • 1
    For GKE, monitoring works off the box and doesn't require stackdriver agent. If you go to your Stackdriver UI, you can view your GKE clusters as first class entities. Since the new Container-VM image is optimized for security and containers, we are working on containerizing the stackdriver agent. Sorry for the inconvenience! – Vishnu Kannan Oct 13 '16 at 19:16
  • 1
    @VishnuKannan: As hiroshi pointed out above, monitoring only includes _external_ metrics which excludes memory. We run into OOM issue quite often in GKE so it's important to track memory usage on our nodes (as Node events are only persisted for an hour). The GKE UI options "Add cloud monitoring" implies that the agent will be installed but it isn't (and can't be on GCI). Really looking forward to a fix for this. – jwadsack Nov 10 '16 at 17:39

4 Answers4

8

You can enable Stackdriver Monitoring Agent on Container OS VM Instances, just run this command (and restart it) in order to enable the monitoring agent:

gcloud compute instances add-metadata instance-name --metadata=google-monitoring-enabled=true
kubii2
  • 81
  • 1
  • 1
  • 1
    Thanks for the answer, but I cannnot find document about google-monitoring-enabled metadata. How do you know this? – hiroshi Jan 24 '20 at 00:02
  • 2
    I had a discussion with Google support and they shared this and it works. – kubii2 Jan 27 '20 at 10:16
  • The metadata `google-monitoring-enabled=true` will install the "node problem detector", which is not Stackdriver Monitoring Agent. It cannot monitor the metrics that Stackdriver can do, such as "memory utilization" and "disk usage" etc. https://github.com/kubernetes/node-problem-detector https://cloud.google.com/container-optimized-os/docs/how-to/monitoring#metadata – gecko655 May 24 '21 at 09:55
7

You can do

sudo systemctl start stackdriver-logging
sudo systemctl start stackdriver-monitoring

It will spin up some containers with the agent running. Data will show up in your stackdriver dashboard a few minutes later.

I didn't find it documented anywhere, so I can't tell in which images exactly this is available. But I tested it in Container-Optimized OS 77-12371.114.0 stable

Guillaume
  • 1,051
  • 1
  • 11
  • 15
  • Thanks for your answer. Next time I create a node with Google Container VM image I'll try it. – hiroshi Nov 07 '19 at 00:24
  • Great answer, this should really be added to the official docs on container-optimized os. I have send GCP a feedback to include this. – eyeezzi Jul 13 '20 at 22:13
6

As far as I know (and what has been confirmed to me by Google), the new Chromium OS image currently does not support the Stackdriver agent. As a workaround I upgraded the node pool back to ‘container-vm’ (which has the Debian image) by using the following command:

$ gcloud container clusters upgrade YOUR_CLUSTER_NAME --image-type=container_vm --node-pool=YOUR_NODE_POOL

Replace the cluster name and set the node pool name to the one which was upgraded to gci earlier (In my case 'default-pool'). The node versions will be upgraded to the newest ones. You can however add an option to deploy another version.

You should now be able to install the Stackdriver agent just as you are used to and set up your desired custom metrics.

codemoped
  • 245
  • 2
  • 10
2

The way I was able to get around the issue with the agent's incompatibility with the new Chromium image was to deploy the agent as a container running in privileged mode (conveniently already built: https://github.com/wikiwi/stackdriver-agent) within a kubernetes DaemonSet so it runs on each host. Here's the YAML for what I ended up using (spaces matter):

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: stackdriver-agent
spec:
  template:
    metadata:
      labels:
        app: stackdriver-agent
    spec:
      containers:
      - name: stackdriver-agent
        image: wikiwi/stackdriver-agent
        securityContext:
          privileged: true
        volumeMounts:
        - mountPath: /mnt/proc
          name: procmnt
        env:
          - name: MONITOR_HOST
            value: "true"
      volumes:
      - name: procmnt
        hostPath:
          path: /proc
Woody
  • 809
  • 8
  • 21