My my Postgres-operator Pods dont transfer to new node if one node is failed?

Question

I have a k8s cluster with one master and 3 worker nodes. I have set up crunchy operator high availability with 2 replica sets. This is my deployment file.

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo-ha
spec:
  service:
    type: LoadBalancer
  patroni:
    dynamicConfiguration:
      synchronous_mode: true
      postgresql:
        parameters:
          synchronous_commit: "on"
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.6-2
  postgresVersion: 14
  instances:
    - name: pgha1
      replicas: 2
      dataVolumeClaimSpec:
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1Gi
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - topologyKey: kubernetes.io/hostname
            labelSelector:
              matchLabels:
                postgres-operator.crunchydata.com/cluster: hippo-ha
                postgres-operator.crunchydata.com/instance-set: pgha1
          #- weight: 1
            #podAffinityTerm:
              #topologyKey: kubernetes.io/hostname
              #labelSelector:
                #matchLabels:
                  #postgres-operator.crunchydata.com/cluster: hippo-ha
                  #postgres-operator.crunchydata.com/instance-set: pgha1
  
  monitoring:
    pgmonitor:
      exporter:
       image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.3.0-0
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.41-2
      repos:
      - name: repo1
        volume:
          volumeClaimSpec:
            accessModes:
            - "ReadWriteOnce"
            resources:
              requests:
                storage: 1Gi
  proxy:
    pgBouncer:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbouncer:ubi8-1.17-5
      replicas: 2
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: hippo-ha
                  postgres-operator.crunchydata.com/role: pgbouncer

This deeply pods on 2 different nodes as expected. One is the primary pod and the other is a replica pod. So far so good.

 masterk8s@-machine:~/postgres-operator-examples-3$ kubectl get pods
NAME                                    READY   STATUS      RESTARTS       AGE
crunchy-alertmanager-5cd75b4f75-m6k5l   1/1     Running     0              79m
crunchy-grafana-64b9f9dcc-kl9f7         1/1     Running     1 (74m ago)    79m
crunchy-prometheus-dc4cbff87-hspst      0/1     Running     1 (74m ago)    79m
hippo-ha-backup-478f-svf6j              0/1     Completed   0              92m
hippo-ha-pgbouncer-7b5f679db4-glj7s     2/2     Running     2 (106m ago)   142m
hippo-ha-pgbouncer-7b5f679db4-z74zx     2/2     Running     0              142m
hippo-ha-pgha1-5v9l-0                   5/5     Running     0              18m
hippo-ha-pgha1-ltb2-0                   5/5     Running     0              63m
hippo-ha-repo-host-0                    2/2     Running     4 (62m ago)    142m
pgo-7c867985c-cwbgp                     1/1     Running     0              152m
pgo-upgrade-69b5dfdc45-xjdxt            1/1     Running     0              152m

What Problem I faced: Now I check for the primary pod of Postgres-Operator and it was running on worker node 3. I shut down worker-node3 to check the failover.

Result: All pods of worker-node3 are stuck in the termination state.

What I am expecting: It supposes to deploy all the pods of worker-node-3 to the available other 2 nodes.

Problem: As the pods are in a terminating state I am not able to make any connection to the database and data cannot be fetched or posted. It completely fails the high availability test.

What I have done: I try preferredDuringSchedulingIgnoredDuringExecution: and requiredDuringSchedulingIgnoredDuringExecution: as shown in commented codes. In both cases, pods are stuck in a termination state and I am not able to access the database.

I am sure I missed something but I am not able to find out the mistake. Can you please help me to find the issue? Why pods are not being redeployed and creating their required sets of replicas? It will be a great help. Thanks.

I have faced issues like this before, and I wonder if it's something to do with the way that the cluster is provisioning volumes. Could the volume be tied to the worker? IM my case it was on EKS, and it was because the worker was tied to the volume that was in the AZ. Once I readded a worker in the same AZ it started working again — Josh Beauregard, Mar 06 '23 at 16:03
I don't think Volume should be an issue here. I used Rook-ceph setup and I have an external drive sdb for each worker node. So the data should be synced. I also tried the Openebs Host patch storage class and faced the same issue. — tauqeerahmad24, Mar 07 '23 at 10:34
I created a openebs cStore setup. Each node has an external drive linked with the cStore storage pool. I create a Storage class and PV and each pod have its own PVC under the namespace. Please tell me how I can find out if the volume is tied to the worker node. — tauqeerahmad24, Mar 10 '23 at 15:06

My my Postgres-operator Pods dont transfer to new node if one node is failed?

0 Answers0