How do I get startup-script logs from Container-optimized OS in a GCE instance?

Question

I'm running a container-optimized compute instance with this startup-script:

#!/bin/bash

mkdir /home/my-app
cd /home/my-app
export HOME=/home/my-app

docker-credential-gcr configure-docker


docker run --rm --log-driver=gcplogs --name my-app --security-opt seccomp=./config.json gcr.io/my-project/my-app:latest

The --log-driver and --name flags are set according to GCP community guide and docker docs.

Yet I see no logs from the container boot up.

Also, when I'm SSHing into the instance and running command logger "hello from logger", I don't see it showing up in cloud logger. I've tried converting it to advanced filters and removing all filtering except "hello from logger" string filter.

How do I properly setup the logging? I'm using bunyan inside my NodeJS app, but when it the app fails I have absolutely no visibility? I'd love to have all the logs from journalctl in cloud logger. Or, at least the startup-script part of journalctl. Right now I'm retrieving them by SSHing into the instance and running journalctl -r | grep startup-script.

Update

Access scopes are correctly set:

Stackdriver Logging API: Write Only
Stackdriver Logging API: Write Only

I'm using a default compute engine service account. Here the command that I'm creating this VM with:

gcloud compute instance-templates create $APP_ID-template \
    --scopes=bigquery,default,compute-rw,storage-rw \
    --image-project=cos-cloud \
    --image-family=cos-77-lts \
    --machine-type=e2-medium \
    --metadata-from-file=startup-script=./start.sh \
    --tags=http-server,https-server

gcloud compute instance-groups managed create $APP_ID-group \
    --size=1 \
    --template=$APP_ID-template

Startup-script:

#!/bin/bash

mkdir /home/startDir
cd /home/startDir
export HOME=/home/startDir

docker-credential-gcr configure-docker

docker run --log-driver=gcplogs --name my-app --security-opt seccomp=./config.json gcr.io/project-id/app:latest

This VM running a NodeJS script. I'm not providing JSON keys to my NodeJS script. The bunyan logger is correctly sending logs to the cloud logger. It only fails to send logs when server completely crashes.

Logging API is enabled. I'm getting this:

● stackdriver-logging.service - Fluentd container for Stackdriver Logging
   Loaded: loaded (/usr/lib/systemd/system/stackdriver-logging.service; static; vendor preset: disabled)
   Active: inactive (dead)

When running sudo systemctl status stackdriver-logging command in a VM

Does this work's for you ``` resource.type="gce_instance" resource.labels.instance_id="xxxxxxxxxxxxxxxxxxxxxxx" protoPayload.metadata.instanceMetadataDelta.addedMetadataKeys="startup-script" ``` — Mahboob, Jan 14 '21 at 16:52
no, it doesn't. I know how to search for logs. Startup-script logs showing up when I'm running a regular instance. This problem is only present with Container-optimized OS — stkvtflw, Jan 15 '21 at 04:40
I do get the container logs in Cloud Logging under "VM instance" when running the container from startup script using cos-stable-85-13310-1041-161. Can you confirm that you see those logs when running the container locally (without --log-driver=gcplogs)? Do you see the logs in Cloud Logging when using another boot disk image, for example Ubuntu? — LundinCast, Jan 17 '21 at 11:31
I read your update. When you created the instance, you need to add this flag to enable logging: `--metadata=google-logging-enabled=true` — John Hanley, Jan 18 '21 at 04:43
@JohnHanley because this is the latest Container optimized OS? — stkvtflw, Jan 18 '21 at 04:45
Which image is being used? The current image is cos-stable-85 ... — John Hanley, Jan 18 '21 at 04:46
Try updating the image family to `cos-stable` or `cos-85-lts`. You are using an old image. — John Hanley, Jan 18 '21 at 04:47
the `--metadata=google-logging-enabled=true` flag helped. Image family is not related. The only remaining problem is that the `startup-script` (the most important) part of logs not showing up. Also, why on earth is this flag not mentioned anywhere? How did you figure it out? — stkvtflw, Jan 18 '21 at 05:02
I looked into this further. The startup-scripts logging uses a priority level that is not sent to Stackdriver. Edit the file `/etc/stackdriver/logging.config.d/fluentd-lakitu.conf` Look for the section `Collects all journal logs with priority >= warning`. The PRIORITY is 0 -> 4. If you add "5" and "6" to the list, then the startup-scripts are logged in Operations Logging. However, this change is not persistent across reboots. The question now is how to make this change persistent. — John Hanley, Jan 18 '21 at 07:19
@JohnHanley pls add the details from the comments to your answer. Thanks! — stkvtflw, Jan 21 '21 at 12:42

John Hanley · Accepted Answer · 2021-01-22T17:45:18.153

Google Compute Engine Container-Optimize OS has Operations Logging (formerly Stackdriver) enabled by default.

In my list of problems and solutions, Problem #3 is the most common in my experience.

Possible Problem #1:

By default, new instances have the following scopes enabled:

Stackdriver Logging API: Write Only
Stackdriver Monitoring API: Write Only

If you have modified the instance's Access Scopes, make sure that the Stackdriver scopes are enabled. This requires stopping the instance to modify scopes.

Possible Problem #2:

If you are using a custom service account for this instance, make sure the service account has at least the role roles/logging.logWriter. Without this role or similar, the logger will fail.

Possible Problem #3:

A common problem is the Project Owner did not enable the `Cloud Logging API". Without enabling this API, the instance logger will fail.

To verify if the logger within the instance is failing, SSH into the instance and execute this command:

sudo systemctl status stackdriver-logging

If you see error messages related to the logging API, then enable the Cloud Logging API.

Enable the Cloud Logging API via the CLI:

gcloud services enable logging.googleapis.com --project=<PROJECT_ID>

Or via the Google Cloud Console:

https://console.cloud.google.com/apis/library/logging.googleapis.com

Possible Problem #4:

When creating an instance via the CLI, you need to specify the following command line option otherwise the logging service will not start:

--metadata=google-logging-enabled=true

[UPDATE 01/22/2021]

The OP has two problems. 1) Stackdriver service was not running. The above steps solved that problem. 2) The startup script section was not going to Stackdriver.

The current configuration for Container OS has the log level set too low to send startup-script logs to Stackdriver.

The log level is set by the file /etc/stackdriver/logging.config.d/fluentd-lakitu.conf.

Look for the section Collects all journal logs with priority >= warning. The PRIORITY is 0 -> 4. If you add "5" and "6" to the list, then the startup-scripts are logged in Operations Logging.

You can change the log level but this change does not persist across reboots. I have not found a solution to make changes permanent.

#1 - all good. #2 - using default compute engine service account. #3 - logging enabled. Here is the first command output: `● stackdriver-logging.service - Fluentd container for Stackdriver Logging Loaded: loaded (/usr/lib/systemd/system/stackdriver-logging.service; static; vendor preset: disabled) Active: inactive (dead)` — stkvtflw, Jan 18 '21 at 04:34
`inactive (dead)` means the service is not running. Reboot the instance, ssh login and then check again. See if there is an error for the `stackdriver-logging` service. — John Hanley, Jan 18 '21 at 04:36
i've just resized the Compute Group. If it's dead now, it's dead in 100% of cases. Check out the question, I've added more details. — stkvtflw, Jan 18 '21 at 04:42

score -1 · Answer 2 · answered Jan 14 '21 at 17:05

I'm able to see the startup-script logs in Cloud Logging using following advanced filter log:

resource.type="gce_instance"
resource.labels.instance_id="1234567890"
protoPayload.metadata.instanceMetadataDelta.addedMetadataKeys="startup-script"

As per the GCP doc to view the startup script logs you need to login to the instance and able to see that startup-script output is written to the following log files:

CentOS and RHEL: /var/log/messages
Debian: /var/log/daemon.log
Ubuntu: /var/log/syslog
SLES: /var/log/messages

In order to save some time you can use this command to see the logs:

gcloud compute ssh instance-id --project your-project --zone us-central1-a --command="sudo journalctl -r | grep startup-script"

when I search just "some-string", it finds logs with "some-string" in metadata as well. I've tested your suggestion anyways. Nope, didn't work. Startup-script logs are just not there. — stkvtflw, Jan 15 '21 at 04:43

How do I get startup-script logs from Container-optimized OS in a GCE instance?

Update

2 Answers2

Linked