6

I am trying to use the Kubernetes 1.7.12 fluentd-elasticsearch addon: https://github.com/kubernetes/kubernetes/tree/v1.7.12/cluster/addons/fluentd-elasticsearch

ElasticSearch starts up and can respond with:

{
 "name" : "0322714ad5b7",
 "cluster_name" : "kubernetes-logging",
 "cluster_uuid" : "_na_",
 "version" : {
   "number" : "2.4.1",
   "build_hash" : "c67dc32e24162035d18d6fe1e952c4cbcbe79d16",
   "build_timestamp" : "2016-09-27T18:57:55Z",
   "build_snapshot" : false,
   "lucene_version" : "5.5.2"
 },
 "tagline" : "You Know, for Search"
}

But Kibana is still unable to connect to it. The connection error starts out with:

{"type":"log","@timestamp":"2018-01-23T07:42:06Z","tags":["warning","elasticsearch"],"pid":6,"message":"Unable to revive connection: http://elasticsearch-logging:9200/"}
{"type":"log","@timestamp":"2018-01-23T07:42:06Z","tags":["warning","elasticsearch"],"pid":6,"message":"No living connections"}

And after ElasticSearch is up, the error changes to:

{"type":"log","@timestamp":"2018-01-23T07:42:08Z","tags":["status","plugin:elasticsearch@1.0.0","error"],"pid":6,"state":"red","message":"Status changed from red to red - Service Unavailable","prevState":"red","prevMsg":"Unable to connect to Elasticsearch at http://elasticsearch-logging:9200."}

So it seems as though, Kibana is finally able to get a response from ElasticSearch, but a connection still cannot be established.

This is what the Kibana dashboard looks like: enter image description here

I tried to get the logs to output more information, but do not have enough knowledge about Kibana and ElasticSearch to know what else I can try next.

I am able to reproduce the error locally using this docker-compose.yml:

version: '2'
services:
 elasticsearch-logging:
   image: gcr.io/google_containers/elasticsearch:v2.4.1-2
   ports:
     - "9200:9200"
     - "9300:9300"

 kibana-logging:
   image: gcr.io/google_containers/kibana:v4.6.1-1
   ports:
     - "5601:5601"
   depends_on:
     - elasticsearch-logging
   environment:
     - ELASTICSEARCH_URL=http://elasticsearch-logging:9200

It doesn't look like there should be much involved based on what I can tell from this question: Kibana on Docker cannot connect to Elasticsearch and this blog: https://gunith.github.io/docker-kibana-elasticsearch/

But I can't figure out what I'm missing.

Any ideas what else I might be able to try?

Thank you for your time. :)

Update 1:

curling http://elasticsearch-logging on the Kubernetes cluster resulted in the same output:

{
  "name" : "elasticsearch-logging-v1-68km4",
  "cluster_name" : "kubernetes-logging",
  "cluster_uuid" : "_na_",
  "version" : {
    "number" : "2.4.1",
    "build_hash" : "c67dc32e24162035d18d6fe1e952c4cbcbe79d16",
    "build_timestamp" : "2016-09-27T18:57:55Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.2"
  },
  "tagline" : "You Know, for Search"
}

curling http://elasticsearch-logging/_cat/indices?pretty on the Kubernetes cluster timed out because of a proxy rule. Using the docker-compose.yml and curling locally (e.g. curl localhost:9200/_cat/indices?pretty) results in:

{
  "error" : {
    "root_cause" : [ {
      "type" : "master_not_discovered_exception",
      "reason" : null
    } ],
    "type" : "master_not_discovered_exception",
    "reason" : null
  },
  "status" : 503
}

The docker-compose logs show:

[2018-01-23 17:04:39,110][DEBUG][action.admin.cluster.state] [ac1f2a13a637] no known master node, scheduling a retry

[2018-01-23 17:05:09,112][DEBUG][action.admin.cluster.state] [ac1f2a13a637] timed out while retrying [cluster:monitor/state] after failure (timeout [30s])
[2018-01-23 17:05:09,116][WARN ][rest.suppressed          ] path: /_cat/indices, params: {pretty=}
MasterNotDiscoveredException[null]
     at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:234)
     at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:236)
     at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:804)
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
     at java.lang.Thread.run(Thread.java:745)

Update 2: Running kubectl --namespace kube-system logs -c kubedns po/kube-dns-667321983-dt5lz --tail 50 --follow yields:

I0124 16:43:33.591112       5 dns.go:264] New service: kibana-logging
I0124 16:43:33.591225       5 dns.go:264] New service: nginx
I0124 16:43:33.591251       5 dns.go:264] New service: registry
I0124 16:43:33.591274       5 dns.go:264] New service: sudoe
I0124 16:43:33.591295       5 dns.go:264] New service: default-http-backend
I0124 16:43:33.591317       5 dns.go:264] New service: kube-dns
I0124 16:43:33.591344       5 dns.go:462] Added SRV record &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0124 16:43:33.591369       5 dns.go:462] Added SRV record &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0124 16:43:33.591390       5 dns.go:264] New service: kubernetes
I0124 16:43:33.591409       5 dns.go:462] Added SRV record &{Host:kubernetes.default.svc.cluster.local. Port:443 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0124 16:43:33.591429       5 dns.go:264] New service: elasticsearch-logging

Update 3:

I'm still trying to get everything to work, but with the help of others, I am confident it is a RBAC issue. I'm not completely sure, but it looks like the elasticsearch nodes were not able to connect with the master (which I never knew was even needed) due to permissions.

Here are some steps that helped, in case it helps others starting out:

with RBAC on:

# kubectl --kubeconfig kubeconfig.yaml --namespace kube-system logs po/elasticsearch-logging-v1-wkwcs
F0119 00:18:44.285773       9 elasticsearch_logging_discovery.go:60] kube-system namespace doesn't exist: User "system:serviceaccount:kube-system:default" cannot get namespaces in the namespace "kube-system". (get namespaces kube-system)
goroutine 1 [running]:
k8s.io/kubernetes/vendor/github.com/golang/glog.stacks(0x1f7f600, 0xc400000000, 0xee, 0x1b2)
        vendor/github.com/golang/glog/glog.go:766 +0xa5
k8s.io/kubernetes/vendor/github.com/golang/glog.(*loggingT).output(0x1f5f5c0, 0xc400000003, 0xc42006c300, 0x1ef20c8, 0x22, 0x3c, 0x0)
        vendor/github.com/golang/glog/glog.go:717 +0x337
k8s.io/kubernetes/vendor/github.com/golang/glog.(*loggingT).printf(0x1f5f5c0, 0xc400000003, 0x16949d6, 0x1e, 0xc420579ee8, 0x2, 0x2)
        vendor/github.com/golang/glog/glog.go:655 +0x14c
k8s.io/kubernetes/vendor/github.com/golang/glog.Fatalf(0x16949d6, 0x1e, 0xc420579ee8, 0x2, 0x2)
        vendor/github.com/golang/glog/glog.go:1145 +0x67
main.main()
        cluster/addons/fluentd-elasticsearch/es-image/elasticsearch_logging_discovery.go:60 +0xb53
[2018-01-19 00:18:45,273][INFO ][node                     ] [elasticsearch-logging-v1-wkwcs] version[2.4.1], pid[5], build[c67dc32/2016-09-27T18:57:55Z]
[2018-01-19 00:18:45,275][INFO ][node                     ] [elasticsearch-logging-v1-wkwcs] initializing ...
# kubectl --kubeconfig kubeconfig.yaml --namespace kube-system exec kibana-logging-2104905774-69wgv curl elasticsearch-logging.kube-system:9200/_cat/indices?pretty

{
  "error" : {
    "root_cause" : [ {
      "type" : "master_not_discovered_exception",
      "reason" : null
    } ],
    "type" : "master_not_discovered_exception",
    "reason" : null
  },
  "status" : 503
}

With RBAC off:

#  kubectl --kubeconfig kubeconfig.yaml --namespace kube-system log elasticsearch-logging-v1-7shgk
[2018-01-26 01:19:52,294][INFO ][node                     ] [elasticsearch-logging-v1-7shgk] version[2.4.1], pid[5], build[c67dc32/2016-09-27T18:57:55Z]
[2018-01-26 01:19:52,294][INFO ][node                     ] [elasticsearch-logging-v1-7shgk] initializing ...
[2018-01-26 01:19:53,077][INFO ][plugins                  ] [elasticsearch-logging-v1-7shgk] modules [reindex, lang-expression, lang-groovy], plugins [], sites []
#  kubectl --kubeconfig kubeconfig.yaml --namespace kube-system exec elasticsearch-logging-v1-7shgk curl http://elasticsearch-logging:9200/_cat/indices?pretty
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    40  100    40    0     0      2      0  0:00:20  0:00:15  0:00:05    10
green open .kibana 1 1 1 0 6.2kb 3.1kb 

Thanks everyone for your help :)

Zhao Li
  • 4,936
  • 8
  • 33
  • 51
  • Are elasticsearch and kibana deployed in the same namespace? Could you access the kibana container via a command line and launch some debugging commands? – whites11 Jan 23 '18 at 08:11
  • @whites11, yes they are deployed to the same namespace, `kube-system`. I can do something like `kubectl exec -it po/podname`. Is that what you mean? What kind of debugging commands can I run? – Zhao Li Jan 23 '18 at 08:17
  • yeah that's what I mean. Try running `curl http://elasticsearch-logging:9200` from the kibana pod – whites11 Jan 23 '18 at 08:18
  • I'll try it on the kubernetes cluster tomorrow, but when I run it in the container using the `docker-compose.yml`, I get this: { "name" : "0322714ad5b7", "cluster_name" : "kubernetes-logging", "cluster_uuid" : "_na_", "version" : { "number" : "2.4.1", "build_hash" : "c67dc32e24162035d18d6fe1e952c4cbcbe79d16", "build_timestamp" : "2016-09-27T18:57:55Z", "build_snapshot" : false, "lucene_version" : "5.5.2" }, "tagline" : "You Know, for Search" } – Zhao Li Jan 23 '18 at 08:19
  • Ok and what does `curl http://elasticsearch-logging:9200/_cat/indices?pretty` say? – whites11 Jan 23 '18 at 08:21
  • I remember trying out `curl http://elasticsearch-logging:9200/_cat/xyz` (but I forgot what `xyz` is). And when I did that, it said something about the master node. I'll try run that tomorrow as well and get back to you. What is the response suppose to look like/return/say? – Zhao Li Jan 23 '18 at 08:22
  • I meant to run literally: `curl http://elasticsearch-logging:9200/_cat/indices?pretty`. It is meant to return a list of all indices in the ES cluster and the relative status. – whites11 Jan 23 '18 at 08:23
  • Ok, I'll give that a go tomorrow and post back. Thanks for letting me know about that URL for troubleshooting. :) – Zhao Li Jan 23 '18 at 08:25
  • @whites11, I've tried out the commands and updated the question with the outputs. I'm going to dig into the `master_not_discovered_exception` error, but if you have any suggestions, I'm all ears. Thank you again for your time and help. :) – Zhao Li Jan 23 '18 at 17:54
  • That is the root of your problems, have no clue though, it's the first time I see this error – whites11 Jan 23 '18 at 17:56
  • @whites11 thank you so much for getting me to this point. I'll see if maybe I can go to a newer version of google's image for elasticsearch. Thank you again for all of your help :) – Zhao Li Jan 23 '18 at 18:18
  • @whites11, your guidance helped me figure this one out. If you want to add your troubleshooting tips to an answer, I can go ahead and accept it. Thank you again :) – Zhao Li Jan 26 '18 at 02:07
  • Glad to have been helpful, I added a very general answer, hope it fulfills your request. – whites11 Jan 26 '18 at 09:10

2 Answers2

5

A few troubleshooting tips:

1) ensure ElasticSearch is running fine.

Enter the container running elasticsearch and run:

curl localhost:9200

You should get a JSON, with some data about elasticsearch.

2) ensure ElasticSearch is reachable from the kibana container

Enter the kibana container and run:

curl <elasticsearch_service_name>:9200

You should get the same output as above.

3) Ensure your ES indices are fine.

Run the following command from the elasticsearch container:

curl localhost:9200/_cat/indices?pretty

You should get a table with all indices in your ES cluster and their status (which should be green or yellow in case you only have one ES replica).

If one of the above points fails, check the logs of your ES container for any error messages and try to solve them.

whites11
  • 12,008
  • 3
  • 36
  • 53
1

This exception indicates 2 misconfiguration 1. DNS Addon of Kubernetes is not working properly. Check your dns addon logs 2. Pod 2 Pod communication is not working properly. This is related with your underlying sdn addon cni flannel calico.

You can check by pinging one pod from another pod. If it is not working than check your networking configuration especially kube-proxy component.

Pamir Erdem
  • 166
  • 6
  • Thanks for the tips. Sorry but I’m new to kubernetes, do you know what commands I can run to check those things? I’ll try and do some googling on it as well but me trying to find those commands would be slower. – Zhao Li Jan 24 '18 at 14:59
  • I updated the question with the dns logs. I don't quite know what I'm looking for though. I did some googling on cni flannel calico, but am not able to figure out how to access those logs. They don't seem to be operating the same way as the fluentd-elasticsearch addon and the dns addon. Thanks again for your time. – Zhao Li Jan 24 '18 at 16:58
  • Everything seems works fine and especially on dns and networking. Do you have chance to disable rbac and deploy it like the link below. https://github.com/pires/kubernetes-elasticsearch-cluster. This link does not use rbac so we can undertand that it is related with rbac or not. If it is working than we can focus on rbac. Do we have an option to connect your computer ? – Pamir Erdem Jan 26 '18 at 10:54
  • thank you for the link and the offer to help troubleshoot the RBAC issue further. We are currently trying to use the later version of the add on configurations (1.9.2 https://github.com/kubernetes/kubernetes/tree/v1.9.2/cluster/addons/fluentd-elasticsearch). Hopefully the later configurations will have better support for RBAC. Thank you again for your help. – Zhao Li Jan 26 '18 at 17:00