0

I initialized kubeflow pods using the following command.

juju deploy kubeflow

The following two pods didn't run and gave an error message stating IMAGEPULLBACKOFF.

  1. kfp-viz,
  2. kfp-profile-controller

Yaml code for kfp-viz

Name:         kfp-viz-65bc89cd9b-dnng9
Namespace:    kubeflow
Priority:     0
Node:         mlops/10.50.60.90
Start Time:   Mon, 30 May 2022 05:22:25 +0000
Labels:       app.kubernetes.io/name=kfp-viz
              pod-template-hash=65bc89cd9b
Annotations:  apparmor.security.beta.kubernetes.io/pod: runtime/default
              charm.juju.is/modified-version: 0
              cni.projectcalico.org/podIP: 10.1.190.114/32
              cni.projectcalico.org/podIPs: 10.1.190.114/32
              controller.juju.is/id: ae0e16a7-f10b-41e9-8f64-35346b6c91dd
              model.juju.is/id: a6f5b73a-cb38-42cd-8f8e-36aced0c1bbd
              seccomp.security.beta.kubernetes.io/pod: docker/default
              unit.juju.is/id: kfp-viz/0
Status:       Pending
IP:           10.1.190.114
IPs:
  IP:           10.1.190.114
Controlled By:  ReplicaSet/kfp-viz-65bc89cd9b
Init Containers:
  juju-pod-init:
    Container ID:  containerd://ffab869ab2beeab6a6bfde53be1175da58044c35da4d0bc2b66db231c585b142
    Image:         jujusolutions/jujud-operator:2.9.29
    Image ID:      sha256:47127013daad1c7215de0566f312d2a00eb83b3ef898e7b07f1ceb9e42860b4a
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
    Args:
      -c
      export JUJU_DATA_DIR=/var/lib/juju
      export JUJU_TOOLS_DIR=$JUJU_DATA_DIR/tools

      mkdir -p $JUJU_TOOLS_DIR
      cp /opt/jujud $JUJU_TOOLS_DIR/jujud

      initCmd=$($JUJU_TOOLS_DIR/jujud help commands | grep caas-unit-init)
      if test -n "$initCmd"; then
      $JUJU_TOOLS_DIR/jujud caas-unit-init --debug --wait;
      else
      exit 0
      fi

    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 30 May 2022 05:22:35 +0000
      Finished:     Mon, 30 May 2022 05:24:10 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-82xc9 (ro)
Containers:
  ml-pipeline-visualizationserver:
    Container ID:
    Image:          registry.jujucharms.com/charm/c2o31yht1y825t6n49mwko4wyel0rracnrjn5/oci-image@sha256:13c46cf878062fd6ad672cbec4854eba7e869cd0123a8975bea49b9d75d4e698
    Image ID:
    Port:           8888/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Liveness:       exec [wget -q -S -O - http://localhost:8888/] delay=3s timeout=2s period=5s #success=1 #failure=3
    Readiness:      exec [wget -q -S -O - http://localhost:8888/] delay=3s timeout=2s period=5s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /usr/bin/juju-run from juju-data-dir (rw,path="tools/jujud")
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-82xc9 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  juju-data-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-82xc9:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/arch=amd64
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                   From     Message
  ----     ------   ----                  ----     -------
  Warning  Failed   10m (x338 over 35h)   kubelet  (combined from similar events): Failed to pull image "registry.jujucharms.com/charm/c2o31yht1y825t6n49mwko4wyel0rracnrjn5/oci-image@sha256:13c46cf878062fd6ad672cbec4854eba7e869cd0123a8975bea49b9d75d4e698": rpc error: code = FailedPrecondition desc = failed to pull and unpack image "registry.jujucharms.com/charm/c2o31yht1y825t6n49mwko4wyel0rracnrjn5/oci-image@sha256:13c46cf878062fd6ad672cbec4854eba7e869cd0123a8975bea49b9d75d4e698": failed commit on ref "layer-sha256:d8f1984ce468ddfcc0f2752e09c8bbb5ea8513d55d5dd5f911d2d3dd135e5a84": "layer-sha256:d8f1984ce468ddfcc0f2752e09c8bbb5ea8513d55d5dd5f911d2d3dd135e5a84" failed size validation: 19367144 != 32561845: failed precondition
  Normal   BackOff  10s (x4271 over 36h)  kubelet  Back-off pulling image "registry.jujucharms.com/charm/c2o31yht1y825t6n49mwko4wyel0rracnrjn5/oci-image@sha256:13c46cf878062fd6ad672cbec4854eba7e869cd0123a8975bea49b9d75d4e698"

Yaml code for kfp-profile-controller

Name:         kfp-profile-controller-operator-0
Namespace:    kubeflow
Priority:     0
Node:         mlops/10.50.60.90
Start Time:   Mon, 30 May 2022 05:15:50 +0000
Labels:       controller-revision-hash=kfp-profile-controller-operator-755985f4fc
              operator.juju.is/name=kfp-profile-controller
              operator.juju.is/target=application
              statefulset.kubernetes.io/pod-name=kfp-profile-controller-operator-0
Annotations:  apparmor.security.beta.kubernetes.io/pod: runtime/default
              cni.projectcalico.org/podIP: 10.1.190.93/32
              cni.projectcalico.org/podIPs: 10.1.190.93/32
              controller.juju.is/id: ae0e16a7-f10b-41e9-8f64-35346b6c91dd
              juju.is/version: 2.9.29
              model.juju.is/id: a6f5b73a-cb38-42cd-8f8e-36aced0c1bbd
              seccomp.security.beta.kubernetes.io/pod: docker/default
Status:       Running
IP:           10.1.190.93
IPs:
  IP:           10.1.190.93
Controlled By:  StatefulSet/kfp-profile-controller-operator
Containers:
  juju-operator:
    Container ID:  containerd://a4cc8356c4ce7d8bbb9282ccee17c38937a7a3a3bfa13caa29eaeccb63a86d30
    Image:         jujusolutions/jujud-operator:2.9.29
    Image ID:      sha256:47127013daad1c7215de0566f312d2a00eb83b3ef898e7b07f1ceb9e42860b4a
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
    Args:
      -c
      export JUJU_DATA_DIR=/var/lib/juju
      export JUJU_TOOLS_DIR=$JUJU_DATA_DIR/tools

      mkdir -p $JUJU_TOOLS_DIR
      cp /opt/jujud $JUJU_TOOLS_DIR/jujud

      $JUJU_TOOLS_DIR/jujud caasoperator --application-name=kfp-profile-controller --debug

    State:          Running
      Started:      Mon, 30 May 2022 05:16:05 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      JUJU_APPLICATION:          kfp-profile-controller
      JUJU_OPERATOR_SERVICE_IP:  10.152.183.68
      JUJU_OPERATOR_POD_IP:       (v1:status.podIP)
      JUJU_OPERATOR_NAMESPACE:   kubeflow (v1:metadata.namespace)
    Mounts:
      /var/lib/juju/agents from charm (rw)
      /var/lib/juju/agents/application-kfp-profile-controller/operator.yaml from kfp-profile-controller-operator-config (rw,path="operator.yaml")
      /var/lib/juju/agents/application-kfp-profile-controller/template-agent.conf from kfp-profile-controller-operator-config (rw,path="template-agent.conf")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6x9qh (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  charm:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  charm-kfp-profile-controller-operator-0
    ReadOnly:   false
  kfp-profile-controller-operator-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kfp-profile-controller-operator-config
    Optional:  false
  kube-api-access-6x9qh:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

Name:         kfp-profile-controller-68f554c765-7vvmj
Namespace:    kubeflow
Priority:     0
Node:         mlops/10.50.60.90
Start Time:   Mon, 30 May 2022 05:34:26 +0000
Labels:       app.kubernetes.io/name=kfp-profile-controller
              pod-template-hash=68f554c765
Annotations:  apparmor.security.beta.kubernetes.io/pod: runtime/default
              charm.juju.is/modified-version: 0
              cni.projectcalico.org/podIP: 10.1.190.134/32
              cni.projectcalico.org/podIPs: 10.1.190.134/32
              controller.juju.is/id: ae0e16a7-f10b-41e9-8f64-35346b6c91dd
              model.juju.is/id: a6f5b73a-cb38-42cd-8f8e-36aced0c1bbd
              seccomp.security.beta.kubernetes.io/pod: docker/default
              unit.juju.is/id: kfp-profile-controller/0
Status:       Pending
IP:           10.1.190.134
IPs:
  IP:           10.1.190.134
Controlled By:  ReplicaSet/kfp-profile-controller-68f554c765
Init Containers:
  juju-pod-init:
    Container ID:  containerd://7a1f2d07bce6f58bd7e9899e0f26a2cd36abfacad55d0e6b43fbb8fcb93df289
    Image:         jujusolutions/jujud-operator:2.9.29
    Image ID:      sha256:47127013daad1c7215de0566f312d2a00eb83b3ef898e7b07f1ceb9e42860b4a
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
    Args:
      -c
      export JUJU_DATA_DIR=/var/lib/juju
      export JUJU_TOOLS_DIR=$JUJU_DATA_DIR/tools

      mkdir -p $JUJU_TOOLS_DIR
      cp /opt/jujud $JUJU_TOOLS_DIR/jujud

      initCmd=$($JUJU_TOOLS_DIR/jujud help commands | grep caas-unit-init)
      if test -n "$initCmd"; then
      $JUJU_TOOLS_DIR/jujud caas-unit-init --debug --wait;
      else
      exit 0
      fi

    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 30 May 2022 05:34:32 +0000
      Finished:     Mon, 30 May 2022 05:35:50 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lqf7h (ro)
Containers:
  kubeflow-pipelines-profile-controller:
    Container ID:
    Image:         registry.jujucharms.com/charm/gm1axzm8pxqlan75l3a7znu2mv5bf0pm1wfar/oci-image@sha256:14ec52252771f8fa904afbdac497c80fc3234d518b1e0bced0c810d5748a7347
    Image ID:
    Port:          80/TCP
    Host Port:     0/TCP
    Command:
      python
    Args:
      /hooks/sync.py
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Environment:
      CONTROLLER_PORT:             80
      DISABLE_ISTIO_SIDECAR:       false
      KFP_DEFAULT_PIPELINE_ROOT:
      KFP_VERSION:                 1.7.0-rc.3
      METADATA_GRPC_SERVICE_HOST:  mlmd.kubeflow
      METADATA_GRPC_SERVICE_PORT:  8080
      MINIO_ACCESS_KEY:            minio
      MINIO_HOST:                  minio
      MINIO_NAMESPACE:             kubeflow
      MINIO_PORT:                  9000
      MINIO_SECRET_KEY:            SV25N7GCN6HAYV19M4GMHGX0YTZ840
    Mounts:
      /hooks from kubeflow-pipelines-profile-controller-code (rw)
      /usr/bin/juju-run from juju-data-dir (rw,path="tools/jujud")
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lqf7h (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  juju-data-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kubeflow-pipelines-profile-controller-code:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kubeflow-pipelines-profile-controller-code
    Optional:  false
  kube-api-access-lqf7h:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/arch=amd64
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                    From     Message
  ----     ------   ----                   ----     -------
  Normal   BackOff  28m (x4631 over 38h)   kubelet  Back-off pulling image "registry.jujucharms.com/charm/gm1axzm8pxqlan75l3a7znu2mv5bf0pm1wfar/oci-image@sha256:14ec52252771f8fa904afbdac497c80fc3234d518b1e0bced0c810d5748a7347"
  Warning  Failed   11m (x207 over 38h)    kubelet  Error: ErrImagePull
  Warning  Failed   5m51s (x497 over 37h)  kubelet  (combined from similar events): Failed to pull image "registry.jujucharms.com/charm/gm1axzm8pxqlan75l3a7znu2mv5bf0pm1wfar/oci-image@sha256:14ec52252771f8fa904afbdac497c80fc3234d518b1e0bced0c810d5748a7347": rpc error: code = FailedPrecondition desc = failed to pull and unpack image "registry.jujucharms.com/charm/gm1axzm8pxqlan75l3a7znu2mv5bf0pm1wfar/oci-image@sha256:14ec52252771f8fa904afbdac497c80fc3234d518b1e0bced0c810d5748a7347": failed commit on ref "layer-sha256:3bfc3875e0f70f1fb305c87b4bc4d886f1118ddfedfded03ef0cb1c394cb90f0": "layer-sha256:3bfc3875e0f70f1fb305c87b4bc4d886f1118ddfedfded03ef0cb1c394cb90f0" failed size validation: 20672632 != 53354193: failed precondition
  1. Kubeflow version:1.21:
  2. kfctl version: kfctl v1.0.1-0-gf3edb9b:
  3. Kubernetes platform: Microk8s
  4. Kubernetes version: Client-GitVersion:"v1.24.0" , Server-GitVersion:"v1.21.12-3+6937f71915b56b":
  5. OS : Linux 18.04
  • 1
    Does this answer your question? [How can I debug "ImagePullBackOff"?](https://stackoverflow.com/questions/34848422/how-can-i-debug-imagepullbackoff) – Affes Salem Jun 14 '22 at 10:26

1 Answers1

0

Apparently, these pods are not able to pull these images.

Does these systems connect to internet? with/without proxy?

Anyway a quick resolution can be, just make the images available on all these nodes. As simple as

docker pull registry.jujucharms.com/charm/gm1axzm8pxqlan75l3a7znu2mv5bf0pm1wfar/oci-image@sha256:14ec52252771f8fa904afbdac497c80fc3234d518b1e0bced0c810d5748a7347

you can find other images in the Events section of the describe command you have posted.

Raja Ravindra
  • 303
  • 2
  • 9
  • Can you pls share where we can find these docker registry images? – Soumil Maitra Jun 15 '22 at 11:34
  • Looks like this is it -- registry.jujucharms.com/charm/gm1axzm8pxqlan75l3a7znu2mv5bf0pm1wfar/oci-image – Raja Ravindra Jun 15 '22 at 11:44
  • I wanted to know from where we can get the docker registry images of these pods? – Soumil Maitra Jun 16 '22 at 06:19
  • Looks like juju auto pulls them using some security token, they are available at registry.jujucharms.com... However if you want to do it in a n offline installation... Easiest way is; incase you have any live environment you pick all these images from that setup and have them in your new ones. – Raja Ravindra Jun 16 '22 at 09:57
  • The link that you provided we have not been able to open. We want to find the image address for particular pods. – Soumil Maitra Jun 17 '22 at 06:55