3

When pulling a service-jenkins custom image from ACR, AKS gives the following error:

Warning Failed 0s (x2 over 31s) kubelet Failed to pull image "XXX.azurecr.io/service-jenkins:latest": [rpc error: code = Unknown desc = failed to pull and unpack image "XXX.azurecr.io/service-jenkins:latest": failed to extract layer sha256:XXX: unexpected EOF: unknown, rpc error: code = Unknown desc = failed to pull and unpack image "XXX.azurecr.io/service-jenkins:latest": failed to resolve reference "XXX.azurecr.io/service-jenkins:latest": failed to authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized]

We have taken the following steps in an attempt to resolve the issue:

  1. Connected AKS with ACR using SP instead of using secret stored in the same namespace
  2. Uploaded a sample hello-world image which gets pulled successfully by the AKS
  3. Verified the image secret matches with the ACR keys

We pulled and executed the service-jenkins image using local docker engine to check if there is some issue with image building, but the container is executing normally.

We are unable to pinpoint the exact issue. Any help is appreciated!

Parth
  • 41
  • 1
  • 1
  • 5

3 Answers3

1

If in your error there is wording like "anonymous access token" then run below command on azure cloud shell

follow below steps:-

  1. Go to azure portal
  2. Login into aks cluster using cloudshell
  3. Run below command :

az acr update --anonymous-pull-enabled

Note that this will make your acr publicly available.

By default, access to pull or push content from an Azure container registry is only available to authenticated users. Enabling anonymous (unauthenticated) pull access makes all registry content publicly available for read (pull) actions. Anonymous pull access can be used in scenarios that do not require user authentication such as distributing public container images.

span
  • 5,405
  • 9
  • 57
  • 115
1

It turns out this specific issue occurs when

  1. AKS K8 version > 1.18.xx
  2. Ubuntu 20.10 docker base image is used

On deep diving into the issue, it seems like Ubuntu 20.10 has some layer duplication which doesn't fare well with MSFT's implementation of K8 containerd runtime.

I'm no expert but this is the only difference I noticed on Azure since we also tried the same deployments with IBM Cloud and that seems to function per expectation.

Simply uprading the Ubuntu base to 21.04 fixed the issue for me :)

Parth
  • 41
  • 1
  • 1
  • 5
  • I see the same issue with intermittent failures pulling form gitlab into GKE with base images being node:16-alpine, node:18-alpine, and nginx. Though never with rust being the base image. This has been consistent for more than a year. – user239558 Feb 19 '23 at 09:42
-2

How did you connect the AKS with the ACR?

You can do so by using the Azure CLI (details here) or by creating a role assignment on your own (details here).

For the latter, you will have to assign the acrpull Role to the Managed Identity (or Service Principal) of the AKS node pool.

Philip Welz
  • 2,449
  • 5
  • 12