I attempt to build a Pod that runs a service that requires:
- cluster-internal services to be resolved and accessed by their FQDN (
*.cluster.local
), - while also have an active OpenVPN connection to a remote cluster and have services from this remote cluster to be resolved and accessed by their FQDN (
*.cluster.remote
).
The service container within the Pod without an OpenVPN sidecar can access all services provided an FQDN using the *.cluster.local
namespace. Here is the /etc/resolv.conf
in this case:
nameserver 169.254.25.10
search default.cluster.local svc.cluster.local cluster.local
options ndots:5
When OpenVPN sidecar manages resolv.conf
The OpenVPN sidecar is started in the following way:
containers:
{{- if .Values.vpn.enabled }}
- name: vpn
image: "ghcr.io/wfg/openvpn-client"
imagePullPolicy: {{ .Values.image.pullPolicy | quote }}
volumeMounts:
- name: vpn-working-directory
mountPath: /data/vpn
env:
- name: KILL_SWITCH
value: "off"
- name: VPN_CONFIG_FILE
value: connection.conf
securityContext:
privileged: true
capabilities:
add:
- "NET_ADMIN"
resources:
limits:
cpu: 100m
memory: 80Mi
requests:
cpu: 25m
memory: 20Mi
{{- end }}
and the OpenVPN client configuration contains the following lines:
script-security 2
up /etc/openvpn/up.sh
down /etc/openvpn/down.sh
Then OpenVPN client will overwrite resolv.conf
so that it contains the following:
nameserver 192.168.255.1
options ndots:5
In this case, any service in *.cluster.remote
is resolved, but no services from *.cluster.local
. This is expected.
When OpenVPN sidecar does not manage resolv.conf
, but spec.dnsConfig
is provided
Remove the following lines from the OpenVPN client configuration:
script-security 2
up /etc/openvpn/up.sh
down /etc/openvpn/down.sh
The spec.dnsConfig
is provided as:
dnsConfig:
nameservers:
- 192.168.255.1
searches:
- cluster.remote
Then, resolv.conf
will be the following:
nameserver 192.168.255.1
nameserver 169.254.25.10
search default.cluster.local svc.cluster.local cluster.local cluster.remote
options ndots:5
This would work for *.cluster.remote
, but not for anything *.cluster.local
, because the second nameserver is tried as long as the first times out. I noticed that some folk would get around this limitation by setting up namespace rotation and timeout for 1 second, but this behavior looks very hectic to me, I would not consider this, not even as a workaround. Or maybe I'm missing something. My first question would be: Could rotation and timeout work in this case?
My second question would be: is there any way to make *.cluster.local
and *.cluster.remote
DNS resolves work reliably from the service container inside the Pod and without using something like dnsmasq
?
My third question would be: if dnsmasq
is required, how can I configure it, provided, and overwrite resolv.conf
by also making sure that the Kubernetes-provided nameserver can be anything (169.254.25.10
in this case).
Best, Zoltán