2

I am very confused about why my pods are staying in pending status.

Vitess seems have problem scheduling the vttablet pod on nodes. I built a 2-worker-node Kubernetes cluster (nodes A & B), and started vttablets on the cluster, but only two vttablets start normally, the other three is stay in pending state.

When I allow the master node to schedule pods, then the three pending vttablets all start on the master (first error, then running normally), and I create tables, two vttablet failed to execute.

When I add two new nodes (nodes C & D) to my kubernetes cluster, tear down vitess and restart vttablet, I find that the three vttablet pods still remain in pending state, also if I kick off node A or node B, I get vttablet lost, and it will not restart on new node. I tear down vitess, and also tear down k8s cluster, rebuild it, and this time I use nodes C & D to build a 2-worker-node k8s cluster, and all vttablet now remain in pending status.

NAMESPACE     NAME                               READY     STATUS    RESTARTS   AGE       IP            NODE               NOMINATED NODE
default       etcd-global-5zh4k77slf             1/1       Running   0          46m       192.168.2.3   t-searchredis-a2   <none>
default       etcd-global-f7db9nnfq9             1/1       Running   0          45m       192.168.2.5   t-searchredis-a2   <none>
default       etcd-global-ksh5r9k45l             1/1       Running   0          45m       192.168.1.4   t-searchredis-a1   <none>
default       etcd-operator-6f44498865-t84l5     1/1       Running   0          50m       192.168.2.2   t-searchredis-a2   <none>
default       etcd-test-5g5lmcrl2x               1/1       Running   0          46m       192.168.2.4   t-searchredis-a2   <none>
default       etcd-test-g4xrkk7wgg               1/1       Running   0          45m       192.168.1.5   t-searchredis-a1   <none>
default       etcd-test-jkq4rjrwm8               1/1       Running   0          45m       192.168.2.6   t-searchredis-a2   <none>
default       vtctld-z5d46                       1/1       Running   0          44m       192.168.1.6   t-searchredis-a1   <none>
default       vttablet-100                       0/2       Pending   0          40m       <none>        <none>             <none>
default       vttablet-101                       0/2       Pending   0          40m       <none>        <none>             <none>
default       vttablet-102                       0/2       Pending   0          40m       <none>        <none>             <none>
default       vttablet-103                       0/2       Pending   0          40m       <none>        <none>             <none>
default       vttablet-104                       0/2       Pending   0          40m       <none>        <none>             <none>


apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: 2018-11-27T07:25:19Z
  labels:
    app: vitess
    component: vttablet
    keyspace: test_keyspace
    shard: "0"
    tablet: test-0000000100
  name: vttablet-100
  namespace: default
  resourceVersion: "22304"
  selfLink: /api/v1/namespaces/default/pods/vttablet-100
  uid: 98258046-f215-11e8-b6a1-fa163e0411d1
spec:
  containers:
  - command:
    - bash
    - -c
    - |-
      set -e
      mkdir -p $VTDATAROOT/tmp
      chown -R vitess /vt
      su -p -s /bin/bash -c "/vt/bin/vttablet -binlog_use_v3_resharding_mode -topo_implementation etcd2 -topo_global_server_address http://etcd-global-client:2379 -topo_global_root /global -log_dir $VTDATAROOT/tmp -alsologtostderr -port 15002 -grpc_port 16002 -service_map 'grpc-queryservice,grpc-tabletmanager,grpc-updatestream' -tablet-path test-0000000100 -tablet_hostname $(hostname -i) -init_keyspace test_keyspace -init_shard 0 -init_tablet_type replica -health_check_interval 5s -mysqlctl_socket $VTDATAROOT/mysqlctl.sock -enable_semi_sync -enable_replication_reporter -orc_api_url http://orchestrator/api -orc_discover_interval 5m -restore_from_backup -backup_storage_implementation file -file_backup_storage_root '/usr/local/MySQL_DB_Backup/test'" vitess
    env:
    - name: EXTRA_MY_CNF
      value: /vt/config/mycnf/master_mysql56.cnf
    image: vitess/lite
    imagePullPolicy: Always
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /debug/vars
        port: 15002
        scheme: HTTP
      initialDelaySeconds: 60
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 10
    name: vttablet
    ports:
    - containerPort: 15002
      name: web
      protocol: TCP
    - containerPort: 16002
      name: grpc
      protocol: TCP
    resources:
      limits:
        cpu: 500m
        memory: 1Gi
      requests:
        cpu: 500m
        memory: 1Gi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /dev/log
      name: syslog
    - mountPath: /vt/vtdataroot
      name: vtdataroot
    - mountPath: /etc/ssl/certs/ca-certificates.crt
      name: certs
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-7g2jb
      readOnly: true
  - command:
    - sh
    - -c
    - |-
      mkdir -p $VTDATAROOT/tmp && chown -R vitess /vt
      su -p -c "/vt/bin/mysqlctld -log_dir $VTDATAROOT/tmp -alsologtostderr -tablet_uid 100 -socket_file $VTDATAROOT/mysqlctl.sock -init_db_sql_file $VTROOT/config/init_db.sql" vitess
    env:
    - name: EXTRA_MY_CNF
      value: /vt/config/mycnf/master_mysql56.cnf
    image: vitess/lite
    imagePullPolicy: Always
    name: mysql
    resources:
      limits:
        cpu: 500m
        memory: 1Gi
      requests:
        cpu: 500m
        memory: 1Gi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /dev/log
      name: syslog
    - mountPath: /vt/vtdataroot
      name: vtdataroot
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-7g2jb
      readOnly: true
  dnsPolicy: ClusterFirst
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - hostPath:
      path: /dev/log
      type: ""
    name: syslog
  - emptyDir: {}
    name: vtdataroot
  - hostPath:
      path: /etc/ssl/certs/ca-certificates.crt
      type: ""
    name: certs
  - name: default-token-7g2jb
    secret:
      defaultMode: 420
      secretName: default-token-7g2jb
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2018-11-27T07:25:19Z
    message: '0/3 nodes are available: 1 node(s) had taints that the pod didn''t tolerate,
      2 Insufficient cpu.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: Guaranteed
Dan Kohn
  • 33,811
  • 9
  • 84
  • 100
user1208081
  • 1,057
  • 4
  • 15
  • 29

1 Answers1

5

As you can see down at the bottom:

message: '0/3 nodes are available: 1 node(s) had taints that the pod didn''t tolerate,
  2 Insufficient cpu.'

Meaning that your two worker nodes are out of resources based on the limits you specified in the pod. You will need more workers, or smaller CPU requests.

coderanger
  • 52,400
  • 4
  • 52
  • 75