1

I have a simple application (REST apis based on python and flask) that works well on Google kubernetes engine (GKE). My CI/CD setups create a docker image, push it to Google cloud registry (GCR) and then deploy it to GKE. Everything works well. Now, I added a database. It will be hosted on Google cloud SQL. To accees the database from kubernetes, I'm using google cloud sql proxy (as a side car) and workload identity as recommended by google.

My problem is, after configuring cloud sql proxy, I'm getting this error:

ImagePullBackOff: Cannot pull image 'gcr.io/xxx-project/xxx-image:xxx-tag' from the registry.

the cloud sql proxy image is loaded correctly (I think because it's hosted in a public registry), but not my image, so the pod keeps crashing.

Something I missed? should I add docker credentials? It's weird because it was working before setting the cloud proxy!!

Many thanks for your help,

Best regards

Slim
  • 528
  • 1
  • 6
  • 9
  • can you please provide yaml file for your deployment if possible? – Anna Slastnikova Sep 08 '20 at 14:05
  • Pretty sure workload identity is causing the conflict. Which identity are you using? Does the identity have read permission to pull images from your GCR repo? – Patrick W Sep 08 '20 at 15:20
  • @AnnaSlastnikova, my yaml is based on this one: https://github.com/GoogleCloudPlatform/cloudsql-proxy/blob/master/examples/kubernetes/proxy_with_workload_identity.yaml under containers, I have - name: flask-kubernetes-test image: gcr.io/xxx-project/xxx-image:x-tag I can pull this image using the docker command from a compute engine VM without any account specification – Slim Sep 08 '20 at 19:18
  • @PatrickW, yes I assigned the storage admin to my GSA linked to KSA – Slim Sep 08 '20 at 19:31
  • was GSA specifically created for this? If not may be try to recreate it just to check that service account bidnding is not an issue?... – Anna Slastnikova Sep 08 '20 at 21:07
  • Are you still experiencing this issue? Also, have you activated all the necessary API's for the connection Cloud SQL? – rsalinas Sep 18 '20 at 14:33

1 Answers1

1

I think there's something important to understand here and it's that Autopilot doesn't use Workload Identity or anything to do with the pod's permissions to pull images. It uses the default compute service account for your project.

It is the nodes that need permission to pull images, not the pods. See this note from the GCP documentation on Workload Identity.

Note: Even with Workload Identity enabled, GKE still uses the configured Google Service Account for the node pool to pull container images from the image registry. If you encounter ImagePullBackOff or ErrImagePull errors, check the troubleshooting documentation.

I had the same thing happen to me and it turned out that the default compute service account had been deleted. It restored it (using these instructions Deleted Compute Engine default service account) and gave it storage.admin permissions and that resolved the issue.

Tom Greenwood
  • 1,502
  • 14
  • 17