4

I have a Google Kubernetes Engine cluster which until recently was happily pulling private container images from a Google Container Registry bucket. I haven't changed anything, but now when I update my Kubernetes Deployments, it's unable to launch new pods, and I get the following events:

Normal   Pulling  14s                kubelet, <node-id>  pulling image "gcr.io/cloudsql-docker/gce-proxy:latest"
Normal   Pulling  14s                kubelet, <node-id>  pulling image "gcr.io/<project-id>/backend:62d634e"
Warning  Failed   14s                kubelet, <node-id>  Failed to pull image "gcr.io/<project-id>/backend:62d634e": rpc error: code = Unknown desc = unauthorized: authentication required
Warning  Failed   14s                kubelet, <node-id>  Error: ErrImagePull
Normal   Pulled   13s                kubelet, <node-id>  Successfully pulled image "gcr.io/cloudsql-docker/gce-proxy:latest"
Normal   Created  13s                kubelet, <node-id>  Created container
Normal   Started  13s                kubelet, <node-id>  Started container
Normal   BackOff  11s (x2 over 12s)  kubelet, <node-id>  Back-off pulling image "gcr.io/<project-id>/backend:62d634e"
Warning  Failed   11s (x2 over 12s)  kubelet, <node-id>  Error: ImagePullBackOff

I've checked the following things, which all seem to be as they should:

  • The containers and their tags actually exist, and are correct.
  • The node pool / VM Instances for the GKE cluster have the storage-ro permission
  • The Google Container Registry bucket and GKE cluster are in the same project

I've also tried disabling and re-enabling the container.googleapis.com and containerregistry.googleapis.com services, but that doesn't help.

The Google documentation for the Container Registry states:

Kubernetes Engine clusters are automatically configured with access to pull private images from the Container Registry in the same project. You do not need to follow additional steps to configure authentication if the registry and the cluster are in the same Cloud project.

But this doesn't seem to be the case.

Can anyone shed additional light on what might be going on? Or additional steps to try?

6 Answers6

7

In my case, the issue turned out to be that the node pools generated by a minimal spec file are missing the oauth2 scopes that give access to the registry. Adding

nodePools:
  config:
    oauthScopes:
    - https://www.googleapis.com/auth/devstorage.read_only
    - https://www.googleapis.com/auth/servicecontrol
    - https://www.googleapis.com/auth/service.management.readonly
    - https://www.googleapis.com/auth/trace.append

to my spec fixed things. I think it's the devstorage scope that's the important one, but I'm not sure since I just copy-pasted the whole list of scopes from the spec the web console generates.

Andy Jones
  • 4,723
  • 2
  • 19
  • 24
  • 2
    I think it is just `https://www.googleapis.com/auth/devstorage.read_only` needed as well. At least that was all I added to the service account I was using. – James Hiew May 06 '19 at 17:06
  • "I just copy-pasted the whole list of scopes from the spec the web console generates" how did you do it? – Paweł Szczur Sep 18 '19 at 07:32
  • Go to the "Create cluster" wizard, then look for the "Equivalent REST" button at the bottom of the screen. – Andy Jones Sep 19 '19 at 11:57
  • If this weren't a post from last year I'd assume it's a GKE version issue but I don't understand why I didn't need these scopes before in other clusters in other projects. I'm baffled. – Nathan McKaskle Jan 31 '23 at 17:17
  • Oh wait nevermind I just realized it's because I already have those oauth scopes there in the cluster config. They already exist for each node pool and therefore this answer is wrong. – Nathan McKaskle Jan 31 '23 at 17:20
5

Ok, this turned out to be tricky, but the cause was this:

I used Terraform to set the service account for the nodes in the GKE cluster, but instead of using the email output of the google_service_account resource to specify the service account, I used the unique_id output instead. This was accepted fine by both Terraform and the Google Cloud API.

When Kubernetes (and other things) was trying to access the internal metadata API on each node to get an token it could use, it was receiving a response of Service account is invalid/disabled and a 403 status.

Recreating the node pool with the correctly specified service account fixed the problem.

  • For me, turned out that my terraform template google_bcuket_acl was actually REMOVING the ACL for the service account, even though no reference to it was made anywhere in the template. Dropping this here in case anyone falls into the same trap. – Jean-Bernard Jansen Mar 07 '19 at 15:06
  • My is set to the default account so it's not this. – Nathan McKaskle Jan 31 '23 at 17:22
3

In my case, setting the correct oAuth scopes didn't work. So I just configured it for any other private repository by adding imagePullSecrets to my Pod Spec.

Kubernetes Docs | Pull an Image from a Private Registry

Sample Script to generate registry credentials in pipeline

You could do this manually as well if you don't manage your infrastructure as code right now.

# Setup registry credentials so we can pull images from gcr
gcloud auth print-access-token | docker login -u oauth2accesstoken --password-stdin https://gcr.io

kubectl create secret generic regcred \
    --namespace=development \
    --from-file=.dockerconfigjson="${HOME}/.docker/config.json" \
    --type=kubernetes.io/dockerconfigjson \
    --output yaml --dry-run | kubectl apply -f - # create or update if already created

Sample Deployment File

(Don't mind all the substitutions). That isn't relevant. Just check the last line of the yaml file.

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: ${NAMESPACE}
  name: ${PROJECT_PREFIX}-${PROJECT_TYPE}-${PROJECT_NAME}
  labels:
    name: ${PROJECT_PREFIX}-${PROJECT_TYPE}-${PROJECT_NAME}
spec:
  replicas: ${REPLICA_COUNT}
  selector:
    matchLabels:
      name: ${PROJECT_PREFIX}-${PROJECT_TYPE}-${PROJECT_NAME}
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
  template:
    metadata:
      labels:
        name: ${PROJECT_PREFIX}-${PROJECT_TYPE}-${PROJECT_NAME}
    spec:
      containers:
        - name: ${PROJECT_PREFIX}-${PROJECT_TYPE}-${PROJECT_NAME}
          image: gcr.io/${GOOGLE_PROJECT_ID}/${PROJECT_TYPE}-${PROJECT_NAME}:${GITHUB_SHA}
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: ${PORT}
              protocol: TCP
          readinessProbe:
            httpGet:
              path: /${PROJECT_NAME}/v1/health
              port: ${PORT}
            initialDelaySeconds: 0
            timeoutSeconds: 10
            periodSeconds: 10
          resources:
            requests:
              cpu: ${RESOURCES_CPU_REQUEST}
              memory: ${RESOURCES_MEMORY_REQUEST}
            limits:
              cpu: ${RESOURCES_CPU_LIMIT}
              memory: ${RESOURCES_MEMORY_LIMIT}
          env:
            - name: NODE_ENV
              value: ${NODE_ENV}
            - name: PORT
              value: '${PORT}'
      imagePullSecrets:
        - name: regcred
Clement
  • 4,491
  • 4
  • 39
  • 69
1

I got the same issue when I created a cluster with terraform. Firstly, I only specified service_account in node_config so node pool was made with too small OAuth scopes. Explicitly write both service_account and oauth_scope like below, nodes are able to pull images from private GCR repositories.

resource "google_container_node_pool" "primary_preemptible_nodes" {
  node_config {
    service_account = "${google_service_account.gke_nodes.email}"

    oauth_scopes = [
      "storage-ro",
      "logging-write",
      "monitoring"
    ]
  }
}
  • Thanks @translucens, we ran into this problem and your answer solved it! – Hans Kristian Mar 06 '20 at 17:49
  • How did you specify a service account? I have the same problem, but the above did not solve it. I use the project_no-compute service account. – Mike May 02 '20 at 08:34
  • @Mike If `service_account = "${google_service_account.gke_nodes.email}"` removed, your service account (default compute service account) will be used. – translucens May 02 '20 at 13:14
  • What's wrong with using the default service account? That has worked for me for years now so this isn't the issue. – Nathan McKaskle Jan 31 '23 at 17:25
0

Check the node events for the actual error. For me it said:

Failed to pull image "gcr.io/project/image@sha256:c8e91af54fc17faa1c49d2a05def5cbabf8f0a67fc558eb6cbca138061b8400a":
 rpc error: code = Unknown desc = error pulling image configuration: unknown blob

That turned out to be the image being gone or corrupted. After pushing the image again it worked fine.

Vincent Gerris
  • 7,228
  • 1
  • 24
  • 22
0

For whom using terraform

  1. Go find your node pools yaml
  2. Adding a service account for it with roles/artifactregistry.reader or roles/storage.objectViewer EX.
resource "google_service_account" "kubernetes" {
  account_id = "kubernetes"
}

resource "google_project_iam_member" "allow_image_pull" {
  project = "any-projectname"
  role   = "roles/artifactregistry.reader"
  member  = "serviceAccount:${google_service_account.kubernetes.email}"
}
  1. within the node_config context put the service account there EX.
service_account = google_service_account.kubernetes.email
ZenithS
  • 987
  • 8
  • 20