1

I am trying to enable a multi cluster CockroachDB spawning 3 k8s clusters connected with Cilium Cluster Mesh. The idea of having a multi cluster CockroachDB is described on cockroachlabs.com - 1, 2. Given the fact that the article calls for a change in CoreDNS ConfigMap, instead of using Cilium global-services feels suboptimal.

Therefore the question arises, how to enable a multi cluster CockroachDB in a Cilium Cluster Mesh environment, using Cilium global services instead of hacking CoreDNS ConfigMap ?

With CockroachDB installed via helm, it deploys a StatefulSet with a carefully crafted --join parameter. It contains FQDNs of CockroachDB pods that are to join the cluster.

The pod FQDNs come from service.discover that is created with clusterIP: None and

(...) only exists to create DNS entries for each pod in the StatefulSet such that they can resolve each other's IP addresses.

The discovery service automatically registers DNS entries for all pods within the StatefulSet, so that they can be easily referenced

Can a similar discovery service or alternative be created for a StatefulSet running on a remote cluster ? So that with cluster mesh enabled, pods J,K,L in cluster Β could be reached from pods X,Y,Z in cluster Α by their FQDN ?

As suggested in create-service-per-pod-in-statefulset, one could create services like

{{- range $i, $_ := until 3 -}}
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
    io.cilium/global-service: 'true'
    service.cilium.io/affinity: "remote"
  labels:
    app.kubernetes.io/component: cockroachdb
    app.kubernetes.io/instance: dbs
    app.kubernetes.io/name: cockroachdb
  name: dbs-cockroachdb-remote-{{ $i }}
  namespace: dbs
spec:
  ports:
  - name: grpc
    port: 26257
    protocol: TCP
    targetPort: grpc
  - name: http
    port: 8080
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/component: cockroachdb
    app.kubernetes.io/instance: dbs
    app.kubernetes.io/name: cockroachdb
    statefulset.kubernetes.io/pod-name: cockroachdb-{{ $i }}
  type: ClusterIP
  clusterIP: None
  publishNotReadyAddresses: true
---
kind: Service
apiVersion: v1
metadata:
  name: dbs-cockroachdb-public-remote-{{ $i }}
  namespace: dbs
  labels:
    app.kubernetes.io/component: cockroachdb
    app.kubernetes.io/instance: dbs
    app.kubernetes.io/name: cockroachdb
  annotations:
    io.cilium/global-service: 'true'
    service.cilium.io/affinity: "remote"
spec:
  ports:
  - name: grpc
    port: 26257
    protocol: TCP
    targetPort: grpc
  - name: http
    port: 8080
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/component: cockroachdb
    app.kubernetes.io/instance: dbs
    app.kubernetes.io/name: cockroachdb
{{- end -}}

So that they resemble the original service.discovery and service.public

However, despite the presence of cilium annotations

io.cilium/global-service: 'true'
service.cilium.io/affinity: "remote"

services look bound to the local k8s cluster, resulting in CockroachDB consisting of 3 instead of 6 nodes. (3 in cluster A + 3 in cluster B)

hubble: no cross connect CockroachDB: CockroachDB dash

It does not matter how which service (dbs-cockroachdb-public-remote-X, or dbs-cockroachdb-remote-X) I use in my --join command overwrite

    join:
      - dbs-cockroachdb-0.dbs-cockroachdb.dbs:26257
      - dbs-cockroachdb-1.dbs-cockroachdb.dbs:26257
      - dbs-cockroachdb-2.dbs-cockroachdb.dbs:26257
      - dbs-cockroachdb-public-remote-0.dbs:26257
      - dbs-cockroachdb-public-remote-1.dbs:26257
      - dbs-cockroachdb-public-remote-2.dbs:26257

The result is the same, 3 nodes instead of 6.

Any ideas?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
Krystian Marek
  • 331
  • 4
  • 19
  • Consider to ask sys-admin questions on https://serverfault.com instead. Stack Overflow is for programming questions. – Jonas Aug 08 '23 at 17:50
  • You should not use `io.cilium/global-service: 'true'` in the first place. Since the moment you will add a third cluster, which remote will it point to? Can be any of the 2. You don't want this. At this moment Cilium ClusterMesh seems a bit underdeveloped for this use-case. Ultimately you would just want to expose unique services across clusters, like `cockroachdb-0.eu-west.svc.cluster.local` and `cockroachdb-0.us-east.svc.cluster.local`. – Steffen Brem Aug 18 '23 at 02:08

1 Answers1

2

Apparently due to 7070, patching CoreDNS ConfigMap is the most reasonable thing we can do. In the comments of that bug, an article is mentioned, that provides additional context.

My twist to this story is that I updated the config map with kubernetes plugin config:

apiVersion: v1
data:
  Corefile: |-
    saturn.local {
      log
      errors
      kubernetes saturn.local {
        endpoint https://[ENDPOINT]
        kubeconfig [PATH_TO_KUBECONFIG]
      }
    }
    rhea.local {
      ...

So that I could resolve other names as well. In my setup, each cluster has its own domain.local. PATH_TO_KUBECONFIG is a plane kubeconfig file. Generic secret has to be created in kube-system namespace and the secret volume has to be mounted under coredns deployment.

it works

Krystian Marek
  • 331
  • 4
  • 19
  • I indeed think this is the best way as of now. Another solution that should be maybe even nicer, is if Cilium ClusterMesh could connect to the ClusterIP of the CoreDNS service on the other cluster. That way, yuou don't need the kubernetes plugin in CoreDNS anymore and you can just `forward . ` to the CoreDNS ClusterIP. Sadly, as of now, I don't know of a good way to let Cilium ClusterMesh connect to the services CIDR range. – Steffen Brem Aug 18 '23 at 16:13