12

I have a cluster running on Azure cloud. I have a deployment of a peer service on that cluster. But pods for that deployment is not getting created. I have also scaled up replica set for that depolyment.

Even when i am trying to create simple deployment of docker busybox image then it is not able to create the pods.

Please guide me what could be the issue ?

EDIT

output for describe deployment

Name:               peer0-org-myorg
Namespace:          internal
CreationTimestamp:  Tue, 28 May 2019 06:12:21 +0000
Labels:             cattle.io/creator=norman
                    workload.user.cattle.io/workloadselector=deployment-internal-peer0-org-myorg
Annotations:        deployment.kubernetes.io/revision=1
                    field.cattle.io/creatorId=user-b29mj
                    field.cattle.io/publicEndpoints=null
Selector:           workload.user.cattle.io/workloadselector=deployment-internal-peer0-org-myorg
Replicas:           1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:       Recreate
MinReadySeconds:    0
Pod Template:
  Labels:       workload.user.cattle.io/workloadselector=deployment-internal-peer0-org-myorg
  Annotations:  cattle.io/timestamp=2019-06-11T08:19:40Z
                field.cattle.io/ports=[[{"containerPort":7051,"dnsName":"peer0-org-myorg-hostport","hostPort":7051,"kind":"HostPort","name":"7051tcp70510","protocol":"TCP","sourcePort":7051},{"containerPo...
  Containers:
   peer0-org-myorg:
    Image:       hyperledger/fabric-peer:1.4.0
    Ports:       7051/TCP, 7053/TCP
    Host Ports:  7051/TCP, 7053/TCP
    Environment:
      CORE_LEDGER_STATE_COUCHDBCONFIG_COUCHDBADDRESS:  couchdb0:5984
      CORE_LEDGER_STATE_COUCHDBCONFIG_PASSWORD:        root
      CORE_LEDGER_STATE_COUCHDBCONFIG_USERNAME:        root
      CORE_LEDGER_STATE_STATEDATABASE:                 CouchDB
      CORE_LOGGING_CAUTHDSL:                           INFO
      CORE_LOGGING_GOSSIP:                             WARNING
      CORE_LOGGING_GRPC:                               WARNING
      CORE_LOGGING_MSP:                                WARNING
      CORE_PEER_ADDRESS:                               peer0-org-myorg-com:7051
      CORE_PEER_ADDRESSAUTODETECT:                     true
      CORE_PEER_FILESYSTEMPATH:                        /var/hyperledger/peers/peer0/production
      CORE_PEER_GOSSIP_EXTERNALENDPOINT:               peer0-org-myorg-com:7051
      CORE_PEER_GOSSIP_ORGLEADER:                      false
      CORE_PEER_GOSSIP_USELEADERELECTION:              true
      CORE_PEER_ID:                                    peer0.org.myorg.com
      CORE_PEER_LOCALMSPID:                            orgMSP
      CORE_PEER_MSPCONFIGPATH:                         /mnt/crypto/crypto-config/peerOrganizations/org.myorg.com/peers/peer0.org.myorg.com/msp
      CORE_PEER_PROFILE_ENABLED:                       true
      CORE_PEER_TLS_CERT_FILE:                         /mnt/crypto/crypto-config/peerOrganizations/org.myorg.com/peers/peer0.org.myorg.com/tls/server.crt
      CORE_PEER_TLS_ENABLED:                           false
      CORE_PEER_TLS_KEY_FILE:                          /mnt/crypto/crypto-config/peerOrganizations/org.myorg.com/peers/peer0.org.myorg.com/tls/server.key
      CORE_PEER_TLS_ROOTCERT_FILE:                     /mnt/crypto/crypto-config/peerOrganizations/org.myorg.com/peers/peer0.org.myorg.com/tls/ca.crt
      CORE_PEER_TLS_SERVERHOSTOVERRIDE:                peer0.org.myorg.com
      CORE_VM_ENDPOINT:                                unix:///host/var/run/docker.sock
      FABRIC_LOGGING_SPEC:                             DEBUG
    Mounts:
      /host/var/run from worker1-dockersock (ro)
      /mnt/crypto from crypto (ro)
      /var/hyperledger/peers from vol2 (rw)
  Volumes:
   crypto:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  worker1-crypto-pvc
    ReadOnly:   false
   vol2:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  worker1-pvc
    ReadOnly:   false
   worker1-dockersock:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  worker1-dockersock
    ReadOnly:   false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  peer0-org-myorg-6d6645ddd7 (1/1 replicas created)
NewReplicaSet:   <none>
Events:          <none>
Pankaj Cheema
  • 1,028
  • 2
  • 13
  • 26
  • `kubectl describe deploy %deploymentname%` and see if it says something meaningful – 4c74356b41 Jun 11 '19 at 10:43
  • 5
    As you stated the deployment was created but no pods were, what we need is the output of the replicaset to figure out why it wasn't able to create the pods, can you do a `kubectl get replicaset` and then find the one corresponding to you deployment and then `kubectl describe replicaset ` – lindluni Jun 11 '19 at 21:18

4 Answers4

14

There are a million reasons why your pods could be broken and there is a bunch of information that you can get that would give you more information on why the pods are not being created. I would start with:

What are the pods saying:

kubectl get pods --all-namespaces -o wide

If you can see the pods but they have errors, what do the errors say. Further describe the broken pods.

kubectl describe pod <pod-name>

Or grab logs

kubectl logs <pod-name>

Maybe something went wrong with your deployment. Check the deployments.

kubectl get deployments

Describe the deployments (like pods above), look for errors.

We can't really help you until you provide way more information. What debugging attempts have you made so far? What errors are displayed and where are you seeing them? What is actually happening when there's an attempt to create the pods.

kubectl Get/Describe/Log everything and let us know what's actually happening.

Here's a good place to start:

EDIT: Added a pic of troubleshooting in Azure Portal (mentioned in comments below)

enter image description here

Old Schooled
  • 1,222
  • 11
  • 22
  • 3
    kubectl get pods --all-namespaces -o wide I don't see any peer regarding the deployment. This is the issue deployment is getting created there no any pod. check update question – Pankaj Cheema Jun 11 '19 at 11:08
  • i am able to see deployment for the peer but not pods – Pankaj Cheema Jun 11 '19 at 11:54
  • Can you clarify that last comment? The deployment that you're searching for is there on the list when you run `get deployments` or it isn't there? What about if you run `kubectl rollout status `. – Old Schooled Jun 11 '19 at 11:57
  • If the deployment that you want to deploy has been written and applied correctly, you should see pods that are at the very least attempting to create themselves. We'd see them in the list and be able to at least see errors. It seems like something isn't correct with the deployment itself. Check the yaml that you are deploying for errors. Manually apply this yaml to your clusters and (`kubectl apply`) check the result. What does `get deployments` return? The describe? What do the pods look like after? Are they there now? – Old Schooled Jun 11 '19 at 12:00
  • Also! Go to the Azure Portal. Go to the resource group in question. Check the 'Activity' tab for events. Are any errors here? Also, under the overview tab, in the top left, you should have a link to 'Deployments'. Check this for errors as well. Your failed deployment should be listed here and it can tell you what went wrong. – Old Schooled Jun 11 '19 at 12:02
  • what, how would resource group deployments be connected to kubernetes deployments?? – 4c74356b41 Jun 11 '19 at 12:28
  • Within Azure, resource groups are the environments in which you would deploy a cluster. These are separate things. Within your Azure RG is where your setup live: the virtual machines (workers and master), disks, db servers etc. Resource groups, in this sense, is an Azure term that is not related to K8s. It's how Azure divides working space. Through the portal, you can view this Resource Group, check the status of all the pieces within and, more importantly, see information on every attempted deployment. – Old Schooled Jun 11 '19 at 12:57
  • Check the edit, added a picture of what I'm talking about in Azure. – Old Schooled Jun 11 '19 at 13:07
  • Deployment is already there in the list but when i see pods then there are no pods for that deployment in the pod list. They are not getting created and if delete pods for a running deployment in my cluster then they are not getting created – Pankaj Cheema Jun 11 '19 at 13:19
  • To confirm, in Azure under deployments it says, '1 Succeeded' without any failures. In Azure under Activity Log you have no errors at all? – Old Schooled Jun 11 '19 at 13:22
  • Also, something new... You're deployment is in namespace "internal". What happens when you `kubectl get pods -n internal`. I know you did a get with `--all-namespaces` before... but are there any resources coming back with the "internal" namespace? – Old Schooled Jun 11 '19 at 13:25
  • it shows running pods for other deployments but not for the deployment which i am looking for. So it does not show pods for peer deployment. – Pankaj Cheema Jun 11 '19 at 15:26
  • I think we are misunderstanding each other. What are you seeing in Azure Portal in the places I've described? There is something super simple that we're missing here. How are you creating your deployment? Command line `create` or `apply`? What yaml, exactly, are you trying to deploy? Show us. What happens, exactly, when you attempt the deploy? No error messages? Please be more specific when you reply. Instead of 'it shows running pods...' something like, 'When I run `kubectl get pods -n internal`, it shows...' would be more helpful to understand what piece of advice you're working with. – Old Schooled Jun 11 '19 at 15:35
  • If your deployment shows no events and does not even attempt to spin up pods, I'd assume there's something wrong with your deployment. Gathering more information would be the first step here to finding the error (the Azure Portal stuff I keep asking about). You also can easily prove if your deployment is screwed. Try deploying something super simple. Grab the simplest yaml from google or make one and deploy it to your cluster. Do those pods come up? They should and that would mean your deployment is messed up on your side. – Old Schooled Jun 11 '19 at 15:42
  • I can not deploy nginx as well – TechChain Jun 11 '19 at 16:04
  • OK, then you have to dig deeper. What happens exactly? What method are you using for deployment? What does the describe on the nginx deployment say? What does it say in the Azure Portal? What debugging have you tried? What happened there? With each attempt, we need more information. – Old Schooled Jun 11 '19 at 16:19
  • Details, details, details. I really want to help, however, I keep asking the same questions and I'm not getting enough answers to really figure it out. What does it say in Azure? Azure Portal is your friend! (sometimes.) I've included a nifty little diagram that shows you exactly where and how to look. Azure will tell you when and why deployments fail. – Old Schooled Jun 11 '19 at 16:19
2

It is the responsibility of the kube-apiserver (k8s master plane component) to serve your API requests which is for example : kubectl create .. or kubectl scale ... Now to actually maintain the state of those kubernetes resources to the desired state, is the job of kube-controller-manager (another k8s master plane component). Also, to schedule those resources to nodes is the job of kube-scheduler (another k8s master plane component).

Being said the above information and assuming (I think) you are using managed Kubernetes therefore the above components are managed by you cloud provider. But with my (on-premise kubernetes) experience I can say that if your deployment commands are being executed correctly that means kube-apiserver is working correctly but kube-controller is not functioning correctly. Also, if the pods show up but is stuck in creating status then it is the problem of the kube-scheduler which is not doing it's job.

All in all, it is worth checking the logs of kube-controller and kube-scheduler.

garlicFrancium
  • 2,013
  • 14
  • 23
  • we are using rancher. Sometimes we get alerts like controller manager is not healthy – Pankaj Cheema Jun 11 '19 at 15:24
  • @PankajCheema When Controller manager is not healthy the following will happened (for example): your command `kubectl create/scale/.. ..` will execute successfully but your resources (workload/pods) wont be actually created or scaled. – garlicFrancium Jun 11 '19 at 21:00
1

I faced similar situation when using "Docker Desktop" on my Mac and overcame by increasing the Docker Resources in "Docker Desktop Preferences" ...

enter image description here

So, try increasing your Kubernetes Cluster Resources.

Naga Vijayapuram
  • 845
  • 7
  • 11
0

I had this problem on another cloud provider.

The issue turned out to be a bug on their end. The problem was the nodes were misconfigured and were dropped from the cluster. This command that helped me:

$ kubectl get nodes
No resources found

I then manually SSH'd into a node and ran journalctl -u kubelet

Oct 09 03:43:12 node69114-107110-62fb533ca771 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Oct 09 03:43:12 node69114-107110-62fb533ca771 kubelet[743]: Flag --cloud-provider has been deprecated, will be removed in 1.24 or later, in favor of removing cloud provider code from Kubelet.
Oct 09 03:43:12 node69114-107110-62fb533ca771 kubelet[743]: E1009 03:43:12.391628     743 server.go:205] "Failed to load kubelet config file" err="failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file \"/var/lib/kubelet/config.yaml\", error: open /var/lib/kubelet/config.yaml: no such file or directory" path="/var/lib/kubelet/config.yaml"
Oct 09 03:43:12 node69114-107110-62fb533ca771 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Oct 09 03:43:12 node69114-107110-62fb533ca771 systemd[1]: kubelet.service: Failed with result 'exit-code'.

I found the folder /var/lib/kubelet/ was missing, which indicates that kubeadm init failed to execute. I tried to execute it myself and could not. After gathing this info, the cloud provider quickly escalated and fixed the problem.