0

I set up a Harbor registry which worked successfully for a couple of weeks now. For each deployment and namespace I a have a secret with the credentials from my ~/.docker/config.json file to get access to the registry. Since last weekend I was not able to pull images from that registry anymore and I didn't change anything! The cluster is running on GKE v1.12.5 btw.

What works? I can pull and push images from my local machine witch docker.

What does not work? My Kubernetes cluster cannot pull images anymore and runs in a timeout.

Events:
  Type     Reason          Age                  From                                                       Message
  ----     ------          ----                 ----                                                       -------
  Normal   Scheduled       13m                  default-scheduler                                          Successfully assigned k8s-test7/nginx-k8s-test7-6f7b8fdd79-2ffmp to gke-k8s-cloudops-test-default-pool-72fccd21-hrhk
  Normal   SandboxChanged  12m                  kubelet, gke-k8s-cloudops-test-default-pool-72fccd21-hrhk  Pod sandbox changed, it will be killed and re-created.
  Warning  Failed          11m (x3 over 12m)    kubelet, gke-k8s-cloudops-test-default-pool-72fccd21-hrhk  Failed to pull image "core.k8s-harbor-test.my-domain.com/nginx-test/nginx:1.15.10": rpc error: code = Unknown desc = Error response from daemon: Get https://core.k8s-harbor-test.my-domain.com/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  Failed          11m (x3 over 12m)    kubelet, gke-k8s-cloudops-test-default-pool-72fccd21-hrhk  Error: ErrImagePull
  Normal   BackOff         11m (x7 over 12m)    kubelet, gke-k8s-cloudops-test-default-pool-72fccd21-hrhk  Back-off pulling image "core.k8s-harbor-test.my-domain.com/nginx-test/nginx:1.15.10"
  Normal   Pulling         10m (x4 over 13m)    kubelet, gke-k8s-cloudops-test-default-pool-72fccd21-hrhk  pulling image "core.k8s-harbor-test.my-domain.com/nginx-test/nginx:1.15.10"
  Warning  Failed          3m2s (x38 over 12m)  kubelet, gke-k8s-cloudops-test-default-pool-72fccd21-hrhk  Error: ImagePullBackOff

deployment.yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-k8s-test7
  namespace: k8s-test7
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx-k8s-test7
    spec:
      containers:
      - name: nginx-k8s-test7
        image: core.k8s-harbor-test.my-domain.com/nginx-test/nginx:1.15.10
        volumeMounts:
          - name: webcontent
            mountPath: /usr/share/nginx/html
        ports:
        - containerPort: 80
      volumes:
        - name: webcontent
          configMap:
            name: webcontent
      imagePullSecrets:
      - name: harborcred
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: webcontent
  namespace: k8s-test7
  annotations:
    volume.alpha.kubernetes.io/storage-class: default
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 5Gi

The secret "harborcred" is part of every namespace so that the deployment can access it. The secret was created per kubernetes documentation:

https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/

kubectl create secret generic harborcred \
    --from-file=.dockerconfigjson=~/.docker/config.json \
    --type=kubernetes.io/dockerconfigjson \
    --namespace=k8s-test7

Any help would be appreciated!

Timo Antweiler
  • 171
  • 1
  • 2
  • 11

1 Answers1

0

Hi at first look could you please:

  1. Change image source and use some public one f.e. nginx to verify your deployment doesn't have other issues.
  2. https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ provide also more details about inspecting the "Secrets".
  3. Please also perform additional tests related to connectivity directly from your node as described within this post [How to debug "ImagePullBackOff"?

Additional steps to find the root cause:




 1. Convert your secrets data:
kubectl get secret harborcred -n k8s-test7 --output="jsonpath={.data.\.dockerconfigjson}" | base64 --decode 2. Compare the result of decoding your "auth" field from the 1 step with your docker credentials using:
echo "your auth data" | base64 --decode 3. To find the root cause please use also: kubectl get events -n k8s-test7 | grep pull

Please share with your logs.

Mark
  • 3,644
  • 6
  • 23
  • Hi! Ich logged in to one of my GKE worker nodes and tried to do a docker login there and guess what it failed with the same error message. "request canceld while waiting for connection." I also added another DNS entry to /etc/resolv.conf but still same issue. What could that be? – Timo Antweiler Apr 02 '19 at 11:06
  • 3. Did you check that the node can resolve the DNS of the docker registry by performing a ping.? – Mark Apr 02 '19 at 11:27
  • Ping is not available on the GKE worker nodes I guess. – Timo Antweiler Apr 02 '19 at 11:30
  • Ups sorry! You can start /usr/bin/toolbox which starts a container with debugging tools. So what I found out was that a ping IS working. But a docker login from the nodes in the Netherlands not. I just created aanother cluster in the Frankfurt region and there the docker login from one of the worker nodes workes AND I could install my kubernetes deployment! So there is an issue with the cluster somehow! Very strange! – Timo Antweiler Apr 02 '19 at 11:43
  • Could you verify please this info _GKE 1.12.5-gke.10 is no longer available for new clusters_ within (https://cloud.google.com/kubernetes-engine/docs/release-notes) – Mark Apr 02 '19 at 11:43
  • I know! I'm using 1.12.5-gke.5. I will upgrade to the latest version 1.12.6-gke.7 and see how it goes. – Timo Antweiler Apr 02 '19 at 12:08
  • Ok...the upgrade didn't solve the issue but if I create any new clusters in any regions it works! So what the hell happend to my cluster? Or in aother words why are the nodes not able to do a docker login? – Timo Antweiler Apr 02 '19 at 13:19
  • Are you able to perform ping by "IP" and by "name" to verify it's the problem with dns? – Mark Apr 02 '19 at 15:33
  • 1
    Ping was able from all hosts by name. It was just the docker login that didn't work. – Timo Antweiler Apr 03 '19 at 19:37
  • After I created a new cluster in the GKE i ran into the same issue again after a week or so! One host ist still working the other don't. That is really strange! – Timo Antweiler Apr 16 '19 at 07:33
  • Hi and what are the results with some public images? Did you verify it's not issue related to your registry? – Mark Apr 16 '19 at 07:42
  • I can pull public images that's not a problem. But I can also pull the Harbor images on my local client! Since this is possible it is not an issue with the Harbor registry in my point of view. – Timo Antweiler Apr 16 '19 at 08:04
  • Hi Timo please follow "Additional steps to find the root cause" and share with your logs. – Mark Apr 16 '19 at 09:57