0

I am trying to have a scalable mysql DB whith persistent memory. I thought it was something common, but it seems like online no one really explains it. I am using minikube for my single node cluster. I started off from the kubernetes guide on how to run replicated stateful applications but it does not really get into the persistent volume creation. I have created the configmap like the one in the guide:

apiVersion: v1
kind: ConfigMap
metadata:
  name: mysql
  labels:
    app: mysql
data:
  primary.cnf: |
    # Apply this config only on the primary.
    [mysqld]
    log-bin
  replica.cnf: |
    # Apply this config only on replicas.
    [mysqld]
    super-read-only

And the two serivces:

# Headless service for stable DNS entries of StatefulSet members.
apiVersion: v1
kind: Service
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  ports:
  - name: mysql
    port: 3306
  clusterIP: None
  selector:
    app: mysql
---
# Client service for connecting to any MySQL instance for reads.
# For writes, you must instead connect to the primary: mysql-0.mysql.
apiVersion: v1
kind: Service
metadata:
  name: mysql-read
  labels:
    app: mysql
spec:
  ports:
  - name: mysql
    port: 3306
  selector:
    app: mysql

I needed to initialize the schema of my database, so I made this configmap to pass inside the statefulset:

apiVersion: v1
kind: ConfigMap
metadata:
  name: mysql-initdb-config
data:
  initdb.sql: |
    CREATE DATABASE football;
    CREATE TABLE `squadra` (
        `name` varchar(15) PRIMARY KEY NOT NULL
    );
    USE football;
    ...

I created a storageclass and a persistent volume, thanks to this answer.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-volume
spec:
  storageClassName: local-storage
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 5Gi
  hostPath:
    path: /data/mysql_data/

And made a secret containing the password to the root user of my db:

apiVersion: v1
kind: Secret
metadata:
  name: mysql-pass-root
type: kubernetes.io/basic-auth
stringData:
  username: root
  password: password

My statefulset I have was substantially taken from some answer on stackoverflow I am not finding anymore. It was basically a modification of the one on the kubernetes website. What I added is the database schema initialization, the volumeclaimtemplate and the password from the secret:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  selector:
    matchLabels:
      app: mysql
  serviceName: mysql
  replicas: 2
  template:
    metadata:
      labels:
        app: mysql
    spec:
      initContainers:
      - name: init-mysql
        image: mysql:5.7
        command:
        - bash
        - "-c"
        - |
          set -ex
          # Generate mysql server-id from pod ordinal index.
          [[ `hostname` =~ -([0-9]+)$ ]] || exit 1
          ordinal=${BASH_REMATCH[1]}
          echo [mysqld] > /mnt/conf.d/server-id.cnf
          # Add an offset to avoid reserved server-id=0 value.
          echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf
          # Copy appropriate conf.d files from config-map to emptyDir.
          if [[ $ordinal -eq 0 ]]; then
            cp /mnt/config-map/primary.cnf /mnt/conf.d/
          else
            cp /mnt/config-map/replica.cnf /mnt/conf.d/
          fi
        volumeMounts:
        - name: conf
          mountPath: /mnt/conf.d
        - name: config-map
          mountPath: /mnt/config-map
      - name: clone-mysql
        image: gcr.io/google-samples/xtrabackup:1.0
        command:
        - bash
        - "-c"
        - |
          set -ex
          # Skip the clone if data already exists.
          [[ -d /var/lib/mysql/mysql ]] && exit 0
          # Skip the clone on master (ordinal index 0).
          [[ `hostname` =~ -([0-9]+)$ ]] || exit 1
          ordinal=${BASH_REMATCH[1]}
          [[ $ordinal -eq 0 ]] && exit 0
          # Clone data from previous peer.
          ncat --recv-only mysql-$(($ordinal-1)).mysql 3307 | xbstream -x -C /var/lib/mysql
          # Prepare the backup.
          xtrabackup --prepare --target-dir=/var/lib/mysql
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
          subPath: mysql
        - name: conf
          mountPath: /etc/mysql/conf.d
      containers:
      - name: mysql
        image: mysql:5.7
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-pass-root
              key: password
        - name: MYSQL_USER
          value: server
        - name: MYSQL_DATABASE
          value: medlor
        ports:
        - name: mysql
          containerPort: 3306
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
          subPath: mysql
        - name: conf
          mountPath: /etc/mysql/conf.d
        - name: mysql-initdb
          mountPath: /docker-entrypoint-initdb.d
        resources:
          requests:
            cpu: 500m
            memory: 100Mi
        livenessProbe:
          exec:
            command: ["mysqladmin", "-uroot", "-p$MYSQL_ROOT_PASSWORD", "ping"]
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
        readinessProbe:
          exec:
            # Check we can execute queries over TCP (skip-networking is off).
            command:
            - /bin/sh
            - -ec
            - >-
              mysql -hlocalhost -uroot -p$MYSQL_ROOT_PASSWORD -e'SELECT 1'
          initialDelaySeconds: 5
          periodSeconds: 2
          timeoutSeconds: 1
      - name: xtrabackup
        image: gcr.io/google-samples/xtrabackup:1.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-pass-root
              key: password
        ports:
        - name: xtrabackup
          containerPort: 3307
        command:
        - bash
        - "-c"
        - |
          set -ex
          cd /var/lib/mysql

          # Determine binlog position of cloned data, if any.
          if [[ -f xtrabackup_slave_info ]]; then
            # XtraBackup already generated a partial "CHANGE MASTER TO" query
            # because we're cloning from an existing slave.
            mv xtrabackup_slave_info change_master_to.sql.in
            # Ignore xtrabackup_binlog_info in this case (it's useless).
            rm -f xtrabackup_binlog_info
          elif [[ -f xtrabackup_binlog_info ]]; then
            # We're cloning directly from master. Parse binlog position.
            [[ `cat xtrabackup_binlog_info` =~ ^(.*?)[[:space:]]+(.*?)$ ]] || exit 1
            rm xtrabackup_binlog_info
            echo "CHANGE MASTER TO MASTER_LOG_FILE='${BASH_REMATCH[1]}',\
                  MASTER_LOG_POS=${BASH_REMATCH[2]}" > change_master_to.sql.in
          fi

          # Check if we need to complete a clone by starting replication.
          if [[ -f change_master_to.sql.in ]]; then
            echo "Waiting for mysqld to be ready (accepting connections)"
            until mysql -h localhost -uroot -p$MYSQL_ROOT_PASSWORD -e "SELECT 1"; do sleep 1; done

            echo "Initializing replication from clone position"
            # In case of container restart, attempt this at-most-once.
            mv change_master_to.sql.in change_master_to.sql.orig
            mysql -h localhost -uroot -p$MYSQL_ROOT_PASSWORD <<EOF
          $(<change_master_to.sql.orig),
            MASTER_HOST='mysql-0.mysql',
            MASTER_USER='root',
            MASTER_PASSWORD='$MYSQL_ROOT_PASSWORD',
            MASTER_CONNECT_RETRY=10;
          START SLAVE USER='root' PASSWORD='$MYSQL_ROOT_PASSWORD';
          EOF
          fi

          # Start a server to send backups when requested by peers.
          exec ncat --listen --keep-open --send-only --max-conns=1 3307 -c \
            "xtrabackup --backup --slave-info --stream=xbstream --host=127.0.0.1 --user=root --password=$MYSQL_ROOT_PASSWORD"
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
          subPath: mysql
        - name: conf
          mountPath: /etc/mysql/conf.d
        resources:
          requests:
            cpu: 100m
            memory: 500Mi
      volumes:
      - name: conf
        emptyDir: {}
      - name: config-map
        configMap:
          name: mysql
      - name: mysql-initdb
        configMap:
          name: mysql-initdb-config
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "local-storage"
      resources:
        requests:
          storage: 1Gi

When I start up minikube, I run the following commands:

docker exec minikube mkdir data/mysql_data ->in order to create the folder where mysql saves the data inside the minikube container
minikube mount local_path_to_a_folder:/data/mysql_data -> to keep the mysql data in my physical storage too.
kubectl apply -f pv-volume.yaml (the volume I showed before)
kubectl apply -f database\mysql\mysql-configmap.yaml (the configmap from the guide)
kubectl apply -f database\mysql\mysql-initdb-config.yaml (my configmap with the db schema)
kubectl apply -f database\mysql\mysql-secret.yaml (the secret containing the db password)
kubectl apply -f database\mysql\mysql-services.yaml (the two services)

Now, when I run all these commands my MySql pod fails, and this is what I can see from the dashboard:

unbound pending

Although in the persistent volumes section I can see my volume has been claimed: claimed pv

I must have understood something wrong and there must be a logic error in all of this, but I can't figure out what it is. Any help would be extremely appreciated.

  • You should not normally need to create a StorageClass, PersistentVolume, or PersistentVolumeClaim yourself; the StorageClass should come from the cluster, the StatefulSet creates the PVC, and a piece called a _provisioner_ creates the PV. Minikube [should support a basic persistent volume provisioner](https://minikube.sigs.k8s.io/docs/handbook/persistent_volumes/#dynamic-provisioning-and-csi); does deleting the extra storage setup help? You also have a _lot_ of containers there, does deleting everything other than the main database make a difference? – David Maze Jan 25 '22 at 16:03
  • What's the output of `kubectl get pvc`? Also, I see there is only 1 pv when the desired replicas in statefulset is 2. – Vishrant Jan 25 '22 at 16:11
  • @DavidMaze that's not always true, in case of Local Persistent Volumes and self managed k8s clusters you need to create the sc and pv. – Vishrant Jan 25 '22 at 16:13
  • @DavidMaze thank you for your comment. I have to admit, I am quite confused on the whole persistent storage management. I guess I just have to delete the storage class then, and keep the volume (without the storageclassname) since it is almost like the one you linked to. I think my statefulset does create the pvc though, I have not manually created one. Each pod should have, according to the guide, two containers, one for the main db and one for the replication from previous pods, so I don't think I can delete much. –  Jan 25 '22 at 16:15
  • "> I think my statefulset does create the pvc though" it would be because you might have defined the `volumeClaimTemplates` in the statefulset definition. – Vishrant Jan 25 '22 at 16:17
  • @Vishrant thank you for the comment. The output is `NAME:data-mysql-0 STATUS:Bound VOLUME:pv-volume CAPACITY:5Gi ACCESS MODES: STORAGECLASS:local-storage AGE:25m ` –  Jan 25 '22 at 16:18
  • @Vishrant I don't think I did, I just set the volumeClaimTemplates. I thought this is the way it should work... The desired replicas are two, as you said, but I would like to scale them with an HPA afterwards. Does that mean I would have to create one pv for each pod? –  Jan 25 '22 at 16:20
  • ah yes `volumeClaimTemplates` is the way to create the PVC from statefulset. – Vishrant Jan 25 '22 at 16:22

1 Answers1

1

The storage size of PV and the requested PVC is not matching. You created the PV of size 5Gi and you are requesting for 1Gi using the PVC.

(Before applying new changes for PV make sure the old resources have been removed)

Vishrant
  • 15,456
  • 11
  • 71
  • 120
  • I thought 5Gi was the size of the whole persistent volume, while 1Gi was the one each pod requested access to. Isn't it like that? However, I just tried changing both to 5Gi, but nothing changed. –  Jan 25 '22 at 16:37
  • No, in the case of statefulset PV gets created on each node and there is no sharing between pods unless and until you are using the same PVC (in that case as well it will use the entire PV), in your case there are 2 replicas that mean it will expect two PVCs `data-mysql-0` and `data-mysql-1`. There is no partitioning of PV. – Vishrant Jan 25 '22 at 16:42
  • Sorry to bother you again, but I can't fully understand it. If there is no paritioning of PV, and each pod creates its own, then why create a PV at all?I followed this logic: I created a storageclass (to choose which PV a pod should use), a PV that was referencing that storageclass (to make it possible to allocate inside the PV when requested to the storageclass) and a volumeclaimtemplate which job it is to create PVCs for each pod in the statefulset. I thought each of these PVCs (generated w/template) was asking the storageclass which volume to use, getting answered with the only PV he found. –  Jan 25 '22 at 20:36
  • You can assume PV as a storage space in each kubernetes node (for simplicity local persistent volume, EBS volumes work a little differently as they are network storage). – Vishrant Jan 25 '22 at 21:57
  • When you create a statefulset, that means your application is looking for some statefulness nature, which means even if the pod goes down the data should persist and the application should continue to read data where it left from. This will happen if the pod(docker container) is getting created on the same k8s node and the same storage is assigned. – Vishrant Jan 25 '22 at 21:59
  • When you create PV (in the case of local persistent volumes) a storage space gets reserved on a node for one of the replicas of that statefulset. – Vishrant Jan 25 '22 at 22:00
  • Does that mean I would have to create as many persistent volumes as the number of replicas I want? –  Jan 25 '22 at 22:02
  • the statefulset pods gets created with order example `pod-0`, `pod-1` and in the same order they will get attached with the PVC `pvc-0`, `pvc-1` – Vishrant Jan 25 '22 at 22:02
  • > Does that mean I would have to create as many persistent volumes as the number of replicas I want? Yes – Vishrant Jan 25 '22 at 22:02
  • What if the number of replicas is not predefined? If I want to use a pod autoscaler for example. Would I have to manually create a lot of PVs? –  Jan 25 '22 at 22:03
  • for pod autoscaling you want to make sure that the volumes get scaled too, it's easier in the case of public cloud provided volumes like EBS, but in case of local persistent volume you have to scale it or have an operator that does the scaling of PV when there is a scale up event (https://www.redhat.com/en/topics/containers/what-is-a-kubernetes-operator) – Vishrant Jan 25 '22 at 22:05
  • I have not heard of the partitioning of persistent volumes, the partitioning of storage can be done at the application level though example in HDFS/ Hive, but these application gets the entire persistent volumes attached to the worker nodes. – Vishrant Jan 25 '22 at 22:09
  • Did changing the storage size fixed the issue? @gijoyah – Vishrant Jan 25 '22 at 22:10
  • Thank you very much, you can't imagine how much your explanation helped me. I tried changing the values as you said, so I have the statefulsets making PVCs for 1Gi and one PV for 1Gi. I also changed the number of replicas for now, putting it to 1 in order to avoid not having enough PVs. The MySQL pod still fails to start, now with the following error: `0/1 nodes are available: 1 node(s) didn't find available persistent volumes to bind. Back-off restarting failed container` . So, I guess this was not the only problem. –  Jan 25 '22 at 22:24
  • This answer of mine might be useful for the last error. https://stackoverflow.com/a/70069138/2704032 – Vishrant Jan 25 '22 at 22:38
  • I run `kubectl get nodes --show-labels`, and got all labels from my only node, minikube. However this does not have the name of the persistent volume. –  Jan 25 '22 at 22:51
  • Adding new PVs solved the problem. I do not really know why, since only one pod was running, but your explanation solved my issue. Thank you very much again for your help! –  Jan 26 '22 at 01:09
  • Glad the issue is resolved. PV should be manually deleted before applying the new changes, if you were using kubectl commands on top of the old PV then there will not be any update and the PV would have remained with the same size, but when you created the new PV it came with 1Gi capacity which got attached with the PVC looking for 1Gi PV. – Vishrant Jan 26 '22 at 04:05