I have a python gunicorn web application that throws the following error when I try to resolve an internal dns name using coredns caching:
raise ConnectionError(e, request=request)\nrequests.exceptions.ConnectionError: HTTPConnectionPool(host='lb.consul.local', port=80):
Max retries exceeded with url: /hello/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f414d5259b0>:
Failed to establish a new connection: [Errno -2] Name or service not known',))"
I am able to resolve the same using dig:
dig @172.1.0.54 lb.consul.local
; <<>> DiG 9.9.5-9+deb8u16-Debian <<>> lb.consul.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58411
;; flags: qr rd; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;lb.consul.local. IN A
;; ANSWER SECTION:
lb.consul.local. 1 IN A 172.10.9.0
;; Query time: 1 msec
;; SERVER: 172.1.0.54#53(172.1.0.54)
;; WHEN: Wed Feb 20 02:43:47 UTC 2019
;; MSG SIZE rcvd: 358
One thing to note is the fact that the answer is not authoritative from the dig response codes of qr rd
. If I switch back the /etc/resolv.conf
to point at the authoritative dns server instead of the coredns server acting as a cache, it all works fine again.
Does the requests library have any issues resolving from non-authoritative sources or is there a way to configure the library to accept responses from non-authoritative dns sources ?
EDIT 20th Feb
The server the application is running on is configured correctly to speak to the dns server specified above:
root@server-test-7bff545c5b-42ln5:/app# cat /etc/resolv.conf
nameserver 172.1.0.54
search nstest.svc.cluster.local svc.cluster.local cluster.local
ec2.internal
options ndots:5
EDIT 20th Feb 8:50 AM PST
I have been able to reproduce this with just python shell inside the machine if I run it back to back immediately:
>>> import socket
>>> socket.getaddrinfo('lb.consul.local', 80, 0, socket.SOCK_STREAM)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('172.10.9.0', 80))]
>>> socket.getaddrinfo('lb.consul.local', 80, 0, socket.SOCK_STREAM)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
The logs on the dns side:
2019-02-20T16:35:21.688Z [INFO] 172.10.112.60:41539 - 6366 "AAAA IN lb.consul.local. udp 57 false 512" NOERROR qr,aa,rd 134 0.003542729s
2019-02-20T16:35:21.717Z [INFO] 172.10.112.60:58468 - 40098 "AAAA IN lb.consul.local. udp 57 false 512" NOERROR qr,rd 134 0.000064083s
Again, the failed response is missing aa
.
EDIT 20th Feb 6:05 PM PST
A few more hours into this and I just worked around the problem by just disabling negative cache in coredns through this PR: https://github.com/coredns/coredns/pull/2588.
This seems to have fixed the problem. But then again, I still have no clue what caused those negative ipv6 query results coming from coredns cache to cause an exception in the sockets library when clearly the ipv4 one was resolving.