5

I have two coreos machines with CoreOS beta (1185.2.0).

I install kuberentes with rkt containers using a modified script, the original script is at https://github.com/coreos/coreos-kubernetes/tree/master/multi-node/generic. the modified version is at https://github.com/kfirufk/coreos-kubernetes-multi-node-generic-install-script.

the environment variables that I set for the script are:

ADVERTISE_IP=10.79.218.2
ETCD_ENDPOINTS="https://coreos-2.tux-in.com:2379,https://coreos-3.tux-in.com:2379"
K8S_VER=v1.4.3_coreos.0
HYPERKUBE_IMAGE_REPO=quay.io/coreos/hyperkube
POD_NETWORK=10.2.0.0/16
SERVICE_IP_RANGE=10.3.0.0/24
K8S_SERVICE_IP=10.3.0.1
DNS_SERVICE_IP=10.3.0.10
USE_CALICO=true
CONTAINER_RUNTIME=rkt
ETCD_CERT_FILE="/etc/ssl/etcd/etcd1.pem"
ETCD_KEY_FILE="/etc/ssl/etcd/etcd1-key.pem"
ETCD_TRUSTED_CA_FILE="/etc/ssl/etcd/ca.pem"
ETCD_CLIENT_CERT_AUTH=true
OVERWRITE_ALL_FILES=true
CONTROLLER_HOSTNAME="coreos-2.tux-in.com"
ETCD_CERT_ROOT_DIR="/etc/ssl/etcd"
ETCD_SCHEME="https"
ETCD_AUTHORITY="coreos-2.tux-in.com:2379"
IS_MASK_UPDATE_ENGINE=false

the most noted changes is added support for etcd2 tls certificates and kubeconfig yaml usage instead of depreated --api-server.

currently I'm trying to install using the controller script for coreos-2.tux-in.com.

the kubeconfig yaml for the controller node contains:

current-context: tuxin-coreos-context
apiVersion: v1
clusters:
- cluster:
    certificate-authority: /etc/kubernetes/ssl/ca.pem
    server: https://coreos-2.tux-in.com:443
  name: tuxin-coreos-cluster
contexts:
- context:
    cluster: tuxin-coreos-cluster
  name: tuxin-coreos-context
kind: Config
preferences:
  colors: true
users:
- name: kubelet
  user:
    client-certificate: /etc/kubernetes/ssl/apiserver.pem
    client-key: /etc/kubernetes/ssl/apiserver-key.pem

the generated kubelet.servicefile contains

[Service]
Environment=KUBELET_VERSION=v1.4.3_coreos.0
Environment=KUBELET_ACI=quay.io/coreos/hyperkube
Environment="RKT_OPTS=--volume dns,kind=host,source=/etc/resolv.conf   --mount volume=dns,target=/etc/resolv.conf   --volume rkt,kind=host,source=/opt/bin/host-rkt   --mount volume=rkt,target=/usr/bin/rkt   --volume var-lib-rkt,kind=host,source=/var/lib/rkt   --mount volume=var-lib-rkt,target=/var/lib/rkt   --volume stage,kind=host,source=/tmp   --mount volume=stage,target=/tmp   --volume var-log,kind=host,source=/var/log   --mount volume=var-log,target=/var/log"
ExecStartPre=/usr/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/usr/bin/mkdir -p /var/log/containers
ExecStart=/usr/lib/coreos/kubelet-wrapper   --kubeconfig=/etc/kubernetes/controller-kubeconfig.yaml   --register-schedulable=false   --cni-conf-dir=/etc/kubernetes/cni/net.d   --network-plugin=cni   --container-runtime=rkt   --rkt-path=/usr/bin/rkt   --rkt-stage1-image=coreos.com/rkt/stage1-coreos   --allow-privileged=true   --pod-manifest-path=/etc/kubernetes/manifests   --hostname-override=10.79.218.2   --cluster_dns=10.3.0.10   --cluster_domain=cluster.local
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

now.. I'm pretty sure it's related to using --kubeconfig instead of --api-server cause I started getting this error only after this change.

the kubelet log output is at http://pastebin.com/eD8TrMJJ

kubelet is not installed properly now, on my desktop when I run kubectl get nodes it returns an empty list.

any ideas?

update

output of kubectl get nodes --v=8 at http://pastebin.com/gDBbn0rn

update

etcdctl ls /registry/minions output:

Error:  100: Key not found (/registry/minions) [42662]

ps -aef | grep kubelet on controller

root      2054     1  3 12:49 ?        00:18:06 /kubelet --kubeconfig=/etc/kubernetes/controller-kubeconfig.yaml --register-schedulable=false --cni-conf-dir=/etc/kubernetes/cni/net.d --network-plugin=cni --container-runtime=rkt --rkt-path=/usr/bin/rkt --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --hostname-override=10.79.218.2 --cluster_dns=10.3.0.10 --cluster_domain=cluster.local
root      2605     1  0 12:51 ?        00:00:00 stage1/rootfs/usr/lib/ld-linux-x86-64.so.2 stage1/rootfs/usr/bin/systemd-nspawn --boot --register=true --link-journal=try-guest --keep-unit --quiet --uuid=b7008337-7b90-4fd7-8f1f-7bc45f056685 --machine=rkt-b7008337-7b90-4fd7-8f1f-7bc45f056685 --directory=stage1/rootfs --bind=/var/lib/kubelet/pods/52646008312b398ac0d3031ad8b9e280/containers/kube-scheduler/ce639294-9f68-11e6-a3bd-1c6f653e6f72:/opt/stage2/kube-scheduler/rootfs/dev/termination-log --bind=/var/lib/kubelet/pods/52646008312b398ac0d3031ad8b9e280/containers/kube-scheduler/etc-hosts:/opt/stage2/kube-scheduler/rootfs/etc/hosts --bind=/var/lib/kubelet/pods/52646008312b398ac0d3031ad8b9e280/containers/kube-scheduler/etc-resolv-conf:/opt/stage2/kube-scheduler/rootfs/etc/resolv.conf --capability=CAP_AUDIT_WRITE,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FSETID,CAP_FOWNER,CAP_KILL,CAP_MKNOD,CAP_NET_RAW,CAP_NET_BIND_SERVICE,CAP_SETUID,CAP_SETGID,CAP_SETPCAP,CAP_SETFCAP,CAP_SYS_CHROOT -- --default-standard-output=tty --log-target=null --show-status=0
root      2734     1  0 12:51 ?        00:00:00 stage1/rootfs/usr/lib/ld-linux-x86-64.so.2 stage1/rootfs/usr/bin/systemd-nspawn --boot --register=true --link-journal=try-guest --keep-unit --quiet --uuid=ee6be263-c4ed-4a70-879c-57e2dde4ab7a --machine=rkt-ee6be263-c4ed-4a70-879c-57e2dde4ab7a --directory=stage1/rootfs --bind=/var/lib/kubelet/pods/0c997ab29f8d032a29a952f578d9014c/containers/kube-apiserver/ceb3598e-9f68-11e6-a3bd-1c6f653e6f72:/opt/stage2/kube-apiserver/rootfs/dev/termination-log --bind=/var/lib/kubelet/pods/0c997ab29f8d032a29a952f578d9014c/containers/kube-apiserver/etc-hosts:/opt/stage2/kube-apiserver/rootfs/etc/hosts --bind=/var/lib/kubelet/pods/0c997ab29f8d032a29a952f578d9014c/containers/kube-apiserver/etc-resolv-conf:/opt/stage2/kube-apiserver/rootfs/etc/resolv.conf --bind-ro=/etc/ssl/etcd:/opt/stage2/kube-apiserver/rootfs/etc/ssl/etcd --bind-ro=/etc/kubernetes/ssl:/opt/stage2/kube-apiserver/rootfs/etc/kubernetes/ssl --bind-ro=/usr/share/ca-certificates:/opt/stage2/kube-apiserver/rootfs/etc/ssl/certs --capability=CAP_AUDIT_WRITE,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FSETID,CAP_FOWNER,CAP_KILL,CAP_MKNOD,CAP_NET_RAW,CAP_NET_BIND_SERVICE,CAP_SETUID,CAP_SETGID,CAP_SETPCAP,CAP_SETFCAP,CAP_SYS_CHROOT -- --default-standard-output=tty --log-target=null --show-status=0
root      2760     1  0 12:51 ?        00:00:00 stage1/rootfs/usr/lib/ld-linux-x86-64.so.2 stage1/rootfs/usr/bin/systemd-nspawn --boot --register=true --link-journal=try-guest --keep-unit --quiet --uuid=6a9e6598-3c1d-4563-bbdf-4ca1774f8f83 --machine=rkt-6a9e6598-3c1d-4563-bbdf-4ca1774f8f83 --directory=stage1/rootfs --bind-ro=/etc/kubernetes/ssl:/opt/stage2/kube-controller-manager/rootfs/etc/kubernetes/ssl --bind-ro=/usr/share/ca-certificates:/opt/stage2/kube-controller-manager/rootfs/etc/ssl/certs --bind=/var/lib/kubelet/pods/11d558df35524947fb7ed66cf7bed0eb/containers/kube-controller-manager/cebd2d3d-9f68-11e6-a3bd-1c6f653e6f72:/opt/stage2/kube-controller-manager/rootfs/dev/termination-log --bind=/var/lib/kubelet/pods/11d558df35524947fb7ed66cf7bed0eb/containers/kube-controller-manager/etc-hosts:/opt/stage2/kube-controller-manager/rootfs/etc/hosts --bind=/var/lib/kubelet/pods/11d558df35524947fb7ed66cf7bed0eb/containers/kube-controller-manager/etc-resolv-conf:/opt/stage2/kube-controller-manager/rootfs/etc/resolv.conf --capability=CAP_AUDIT_WRITE,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FSETID,CAP_FOWNER,CAP_KILL,CAP_MKNOD,CAP_NET_RAW,CAP_NET_BIND_SERVICE,CAP_SETUID,CAP_SETGID,CAP_SETPCAP,CAP_SETFCAP,CAP_SYS_CHROOT -- --default-standard-output=tty --log-target=null --show-status=0
root      3861     1  0 12:53 ?        00:00:00 stage1/rootfs/usr/lib/ld-linux-x86-64.so.2 stage1/rootfs/usr/bin/systemd-nspawn --boot --register=true --link-journal=try-guest --keep-unit --quiet --uuid=3dad014c-b31f-4e11-afb7-59214a7a4de9 --machine=rkt-3dad014c-b31f-4e11-afb7-59214a7a4de9 --directory=stage1/rootfs --bind=/var/lib/kubelet/pods/7889fbb0a1c86d9bfdb12908938dee20/containers/kube-policy-controller/etc-hosts:/opt/stage2/kube-policy-controller/rootfs/etc/hosts --bind=/var/lib/kubelet/pods/7889fbb0a1c86d9bfdb12908938dee20/containers/kube-policy-controller/etc-resolv-conf:/opt/stage2/kube-policy-controller/rootfs/etc/resolv.conf --bind-ro=/etc/ssl/etcd:/opt/stage2/kube-policy-controller/rootfs/etc/ssl/etcd --bind=/var/lib/kubelet/pods/7889fbb0a1c86d9bfdb12908938dee20/containers/kube-policy-controller/dfd7a7dc-9f68-11e6-a3bd-1c6f653e6f72:/opt/stage2/kube-policy-controller/rootfs/dev/termination-log --capability=CAP_AUDIT_WRITE,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FSETID,CAP_FOWNER,CAP_KILL,CAP_MKNOD,CAP_NET_RAW,CAP_NET_BIND_SERVICE,CAP_SETUID,CAP_SETGID,CAP_SETPCAP,CAP_SETFCAP,CAP_SYS_CHROOT --bind=/var/lib/kubelet/pods/7889fbb0a1c86d9bfdb12908938dee20/containers/leader-elector/etc-hosts:/opt/stage2/leader-elector/rootfs/etc/hosts --bind=/var/lib/kubelet/pods/7889fbb0a1c86d9bfdb12908938dee20/containers/leader-elector/etc-resolv-conf:/opt/stage2/leader-elector/rootfs/etc/resolv.conf --bind=/var/lib/kubelet/pods/7889fbb0a1c86d9bfdb12908938dee20/containers/leader-elector/f9e65e21-9f68-11e6-a3bd-1c6f653e6f72:/opt/stage2/leader-elector/rootfs/dev/termination-log --capability=CAP_AUDIT_WRITE,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FSETID,CAP_FOWNER,CAP_KILL,CAP_MKNOD,CAP_NET_RAW,CAP_NET_BIND_SERVICE,CAP_SETUID,CAP_SETGID,CAP_SETPCAP,CAP_SETFCAP,CAP_SYS_CHROOT -- --default-standard-output=tty --log-target=null --show-status=0

ps -aef | grep kubelet on worker

root      2092     1  0 12:56 ?        00:03:56 /kubelet --cni-conf-dir=/etc/kubernetes/cni/net.d --network-plugin=cni --container-runtime=rkt --rkt-path=/usr/bin/rkt --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --hostname-override=10.79.218.3 --cluster_dns=10.3.0.10 --cluster_domain=cluster.local --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem

update

when I run journalctl -f -u kubelet I notice that every 10 seconds I get the following message:

Nov 02 13:01:54 coreos-2.tux-in.com kubelet-wrapper[1751]: I1102 13:01:54.360929    1751 kubelet_node_status.go:203] Setting node annotation to enable volume controller attach/detach

to which service this message is related? maybe something is restarting itself every 10 seconds because of some sort of failure.

ufk
  • 30,912
  • 70
  • 235
  • 386
  • can you paste the output of "kubectl get nodes --v=8" – Hrishikesh Kumar Oct 24 '16 at 15:01
  • @HrishikeshKumar - updated main post. thanks – ufk Oct 28 '16 at 13:51
  • the kubectl is able to contact the apiserver, the apiserver is returning an empty response. – Hrishikesh Kumar Oct 31 '16 at 10:34
  • @HrishikeshKumar - thank you. any ideas why? or how to investigate this issue further ? – ufk Oct 31 '16 at 10:35
  • can you issue a command `etcdctl ls /registry/minions` on your master, it must list all the worker nodes, in your case, it might return empty. I am guessing the worker has not registered to the master correctly. The kubelet on the worker should register itself with the master when started. Maybe that is not able to find the apiserver after your changes. Search for a file called kubelet.log - /var/log/kubernetes/kubelet.log and see for any errors. Also, the apiserver should be in the parameters. You can do a ps -aef to check if there are any kubelet process running. – Hrishikesh Kumar Oct 31 '16 at 10:53
  • @ufw: I work on ubuntu, not familiar with coreos. If you have a kubelet conf file, you may like to explicitly add the option `--api-servers=https://10.79.218.2:443` and check. – Hrishikesh Kumar Oct 31 '16 at 11:11
  • @HrishikeshKumar --api-servers is a deprecated option, trying to prevent using it. with that option enabled and without a kubelet config it works :) – ufk Oct 31 '16 at 11:16
  • @ufw: ok :) can you paste the output of `ps -aef | grep kubelet` on both your master and worker (I will try to compare it with what I have for ubuntu) – Hrishikesh Kumar Oct 31 '16 at 11:25
  • @HrishikeshKumar - updated main post again :) – ufk Oct 31 '16 at 20:31

1 Answers1

2

The option --require-kubeconfig for kubelet should help.

luchonacho
  • 6,759
  • 4
  • 35
  • 52