18

I have a Kubernetes cluster that I setup with kube-aws. I'm trying to run a custom NGINX configuration which uses DNS resolutions to proxy_pass. Here is the NGINX block of code

location /api/v1/lead {
  resolver 10.3.0.10 ipv6=off;
  set $container lead-api;
  proxy_pass http://$container:3000;
}

10.3.0.10 comes from the cluster IP of the DNS service found in Kubernetes. I've also tried 127.0.0.11 which is what we use in the docker-compose/docker environments.

$ kubectl describe --namespace=kube-system service kube-dns
Name:                   kube-dns
Namespace:              kube-system
Labels:                 k8s-app=kube-dns
                        kubernetes.io/cluster-service=true
                        kubernetes.io/name=KubeDNS
Selector:               k8s-app=kube-dns
Type:                   ClusterIP
IP:                     10.3.0.10
Port:                   dns     53/UDP
Endpoints:              10.2.26.61:53
Port:                   dns-tcp 53/TCP
Endpoints:              10.2.26.61:53
Session Affinity:       None

This configuration works well on three different environments which use docker-compose. However I get the following error in the NGINX logs of the Kubernetes cluster

[error] 9#9: *20 lead-api could not be resolved (2: Server failure), client: 10.2.26.0, server: , request: "GET /api/v1/lead/661DF757-722B-41BB-81BD-C7FD398BBC88 HTTP/1.1"

If I run nslookup within the NGINX pod I can resolve the host with the same dns server:

$ kubectl exec nginx-1855584872-kdiwh -- nslookup lead-api
Server:         10.3.0.10
Address:        10.3.0.10#53

Name:   lead-api.default.svc.cluster.local
Address: 10.3.0.167

I don't know if it matters or not, but notice the "server" part of the error is empty. When I look at the pod logs for dnsmasq I don't see anything relevant. If I change the NGINX block to hardcode the proxy_pass then it resolves fine. However, I have other configurations that require dynamic proxy names. I could hard code every upstream this way, but I want to know how to make the DNS resolver work.

location /api/v1/lead {
  proxy_pass http://lead-api:3000;
}
blockloop
  • 5,565
  • 5
  • 30
  • 31
  • You probably need to use the full qualified name i.e. lead-api..svc.cluster.local: – MrE Nov 18 '16 at 21:05
  • btw, not sure why don't use use a Service instead of this? the Service will load balance from NGINX to whatever pods you have behind. – MrE Nov 18 '16 at 21:07
  • I can nslookup from within the nginx container with just lead-api and it resolves just fine. Also, I have several backend APIs which are running individually which I want to run under a single url. I looked into using the ingress controller, but those were too complicated for what I was trying to accomplish. – blockloop Nov 18 '16 at 22:34
  • @MrE I updated the OP to show nslookup works – blockloop Nov 18 '16 at 22:39
  • i don't know how the nginx resolve works, but I know there are various ways to do it, and I have had many issues with DNS before, so I would not infer that because nslookup works, nginx resolve should work. Try the FQDN in nginx to see if it helps. I'm still not sure what you're doing here exactly: lead-api is a service, right? so why do you need to use the resolve directive? – MrE Nov 18 '16 at 23:18
  • See here http://stackoverflow.com/a/32846603/903025 – blockloop Nov 19 '16 at 02:42
  • A Service will always be availabke, even if the endpoint is not. you don't need this – MrE Nov 19 '16 at 02:58
  • If "lead-api" is not running or unavailable to be resolved at the time nginx starts up then nginx will die immediately. If, at any point, any of the upstreams become unavailable then NGINX will die. My question is about the resolution of DNS which does not work in NGINX, but works with nslookup. This same process works perfectly fine in a docker/compose environment. – blockloop Nov 19 '16 at 03:07
  • if you use a Kubernetes `Service`, and start it before your start the nginx Pod, ngnix will not die even if your lead-api service is not running. You are confused with the terminology: a kubernetes `Service` is not a Pod or what you call a 'service' (your application), it is a load balancing `proxy` to a number of Pods your run behind – MrE Nov 19 '16 at 03:53

3 Answers3

39

Resolving the name fails because you need to use the Full Qualified Domain name. That is, you should use:

lead-api.<namespace>.svc.cluster.local

not just

lead-api

Using just the hostname will usually work because in kubernetes the resolv.conf is configured with search domains so that you don't usually need to provide a service's FQDN. e.g:

search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.3.240.10
options ndots:5

However, specifying the FQDN is necessary when you tell nginx to use a custom resolver because it does not get the benefit of these domain search specs.

joshperry
  • 41,167
  • 16
  • 88
  • 103
MrE
  • 19,584
  • 12
  • 87
  • 105
  • this was the issue with mine. It worked on one system where we deployed to default NS but not the other where we had a custom NS. To resolve it you can actually just put lead-api. assuming the rest is covered in your resolv.conf. – JamStar Sep 07 '17 at 18:29
  • Works like a charm! – rtindru Aug 15 '18 at 06:12
  • 2
    and what if you can't use FQDN, that is you don't know the namespace and the name of the cluster you are going to be deployed in ? This is a real issue. I have tried with removing the resolver definition in the nginx conf and it does not work better with short names :( – SeB.Fr Apr 23 '19 at 14:42
  • @SeB.Fr if you don't know what cluster you are deploying to, or the namespace you are deploying to, I think you have bigger problems. If you really want to make deployments that are cluster agnostic, then you need to use templates with variable interpolation. You can also use Init containers to re-write the nginx config on container startup. – MrE Apr 24 '19 at 03:45
  • 1
    We build software to be deployed on customer K8s cluster and required some flexibility in deployment. We are relying on helm charts and therefore the namespace is not an issue cause it is a builtin object in Helm. But the cluster name is not supported and by the way if you know of an K8s API to get that info I am all ears. – SeB.Fr Apr 25 '19 at 12:30
  • I'm using nginx `1.16.0` and fully qualified name yet I get the same error and I have top provide `resolver IP_ADDR` to my nginx config... have I missed anything? – xbmono Jun 18 '19 at 06:15
  • My DNS name is NOT in my cluster. It is an external URL. How do I resolve it? – Arrow_Raider Mar 29 '21 at 15:53
  • @SeB.Fr this is common use case, as a workaround, you can inject the namespace as env variable, and templatize the nginx config. nginx alpine Docker image do it for you at runtime. – Thomas Decaux May 18 '23 at 13:25
4

One other option would be to specify kubedns as the resolver. On many systems, this would look something like the following:

resolver kube-dns.kube-system.svc.cluster.local valid=10s;
Greg
  • 1,845
  • 2
  • 16
  • 26
-4

You need to use a Service

http://kubernetes.io/docs/user-guide/services/

A kubernetes Service proxies traffic to your Pods (i.e. what you call 'service', which is your application)

I guess you use Kubernetes for the ability to deploy and scale your applications (Pods) so traffic will need to be load balanced to them once you scale and you have multiple Pods to talk to. This is what a Service does.

A Service has its own IP address. As long as the Service exists, a Nginx Pod referencing this Service in upstream will work fine.

Nginx (free version) dies when it can't resolve the upstream, but if the Service is defined, it has its own IP and it gets resolved.

If the Pods behind the Service are not running, Nginx will not see that, and will try to forward the traffic but will return a 502 (bad gateway)

So, just defined the Service and then bring up your Pods with the proper label so the Service will pick them up. You can delete, scale, replace those Pods without affecting the Nginx Pod. As long as there is at least one Pod running behind the Service, Nginx will always be able to connect to your API.

MrE
  • 19,584
  • 12
  • 87
  • 105
  • lead-api is a service and I understand how services work. I also understand that I can change (and already have) the config to hard code 'http://lead-api'. I understand that the upstream will work fine if I start lead-api service first. I have other dynamic nginx configurations that require the same resolver configuration. This is just an example. My question is not "how to make it work some other way" but "why is DNS resolver not working." – blockloop Nov 19 '16 at 04:13
  • did you try using the FQDN? – MrE Nov 19 '16 at 04:27
  • 1
    Down vote because this misses the point, a service is not the issue here at all. – Francis Upton IV Aug 15 '17 at 05:52
  • 1
    hence why after more details i wrote another answer which was accepted. – MrE Aug 15 '17 at 06:21