1

I have a python gunicorn web application that throws the following error when I try to resolve an internal dns name using coredns caching:

raise ConnectionError(e, request=request)\nrequests.exceptions.ConnectionError: HTTPConnectionPool(host='lb.consul.local', port=80): 
Max retries exceeded with url: /hello/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f414d5259b0>: 
Failed to establish a new connection: [Errno -2] Name or service not known',))"

I am able to resolve the same using dig:

dig @172.1.0.54 lb.consul.local

; <<>> DiG 9.9.5-9+deb8u16-Debian <<>> lb.consul.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58411
;; flags: qr rd; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;lb.consul.local. IN A

;; ANSWER SECTION:
lb.consul.local. 1 IN A 172.10.9.0

;; Query time: 1 msec
;; SERVER: 172.1.0.54#53(172.1.0.54)
;; WHEN: Wed Feb 20 02:43:47 UTC 2019
;; MSG SIZE  rcvd: 358

One thing to note is the fact that the answer is not authoritative from the dig response codes of qr rd. If I switch back the /etc/resolv.conf to point at the authoritative dns server instead of the coredns server acting as a cache, it all works fine again.

Does the requests library have any issues resolving from non-authoritative sources or is there a way to configure the library to accept responses from non-authoritative dns sources ?

EDIT 20th Feb

The server the application is running on is configured correctly to speak to the dns server specified above:

root@server-test-7bff545c5b-42ln5:/app# cat /etc/resolv.conf
nameserver 172.1.0.54
search nstest.svc.cluster.local svc.cluster.local cluster.local 
ec2.internal
options ndots:5

EDIT 20th Feb 8:50 AM PST

I have been able to reproduce this with just python shell inside the machine if I run it back to back immediately:

>>> import socket
>>> socket.getaddrinfo('lb.consul.local', 80, 0, socket.SOCK_STREAM)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('172.10.9.0', 80))]
>>> socket.getaddrinfo('lb.consul.local', 80, 0, socket.SOCK_STREAM)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

The logs on the dns side:

2019-02-20T16:35:21.688Z [INFO] 172.10.112.60:41539 - 6366 "AAAA IN lb.consul.local. udp 57 false 512" NOERROR qr,aa,rd 134 0.003542729s
2019-02-20T16:35:21.717Z [INFO] 172.10.112.60:58468 - 40098 "AAAA IN lb.consul.local. udp 57 false 512" NOERROR qr,rd 134 0.000064083s

Again, the failed response is missing aa.

EDIT 20th Feb 6:05 PM PST

A few more hours into this and I just worked around the problem by just disabling negative cache in coredns through this PR: https://github.com/coredns/coredns/pull/2588.

This seems to have fixed the problem. But then again, I still have no clue what caused those negative ipv6 query results coming from coredns cache to cause an exception in the sockets library when clearly the ipv4 one was resolving.

Uday
  • 653
  • 2
  • 6
  • 15
  • 1
    Are you sure your system is configured to use this nameserver and that Python application uses it? `qr rd` flags are fine, the problem is not that the answer is not authoritative (it will never be authoritative when coming out of the cache of a recursive nameserver). – Patrick Mevzek Feb 20 '19 at 14:11

0 Answers0