72

Is it possible to restart pods automatically based on the time?

For example, I would like to restart the pods of my cluster every morning at 8.00 AM.

Paul Roub
  • 36,322
  • 27
  • 84
  • 93
Leonardo Carraro
  • 1,532
  • 1
  • 11
  • 24

7 Answers7

168

Use a cronjob, but not to run your pods, but to schedule a Kubernetes API command that will restart the deployment everyday (kubectl rollout restart). That way if something goes wrong, the old pods will not be down or removed.

Rollouts create new ReplicaSets, and wait for them to be up, before killing off old pods, and rerouting the traffic. Service will continue uninterrupted.

You have to setup RBAC, so that the Kubernetes client running from inside the cluster has permissions to do needed calls to the Kubernetes API.

---
# Service account the client will use to reset the deployment,
# by default the pods running inside the cluster can do no such things.
kind: ServiceAccount
apiVersion: v1
metadata:
  name: deployment-restart
  namespace: <YOUR NAMESPACE>
---
# allow getting status and patching only the one deployment you want
# to restart
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: deployment-restart
  namespace: <YOUR NAMESPACE>
rules:
  - apiGroups: ["apps", "extensions"]
    resources: ["deployments"]
    resourceNames: ["<YOUR DEPLOYMENT NAME>"]
    verbs: ["get", "patch", "list", "watch"] # "list" and "watch" are only needed
                                             # if you want to use `rollout status`
---
# bind the role to the service account
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: deployment-restart
  namespace: <YOUR NAMESPACE>
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: deployment-restart
subjects:
  - kind: ServiceAccount
    name: deployment-restart
    namespace: <YOUR NAMESPACE>

And the cronjob specification itself:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: deployment-restart
  namespace: <YOUR NAMESPACE>
spec:
  concurrencyPolicy: Forbid
  schedule: '0 8 * * *' # cron spec of time, here, 8 o'clock
  jobTemplate:
    spec:
      backoffLimit: 2 # this has very low chance of failing, as all this does
                      # is prompt kubernetes to schedule new replica set for
                      # the deployment
      activeDeadlineSeconds: 600 # timeout, makes most sense with 
                                 # "waiting for rollout" variant specified below
      template:
        spec:
          serviceAccountName: deployment-restart # name of the service
                                                 # account configured above
          restartPolicy: Never
          containers:
            - name: kubectl
              image: bitnami/kubectl # probably any kubectl image will do,
                                     # optionaly specify version, but this
                                     # should not be necessary, as long the
                                     # version of kubectl is new enough to
                                     # have `rollout restart`
              command:
                - 'kubectl'
                - 'rollout'
                - 'restart'
                - 'deployment/<YOUR DEPLOYMENT NAME>'

Optionally, if you want the cronjob to wait for the deployment to roll out, change the cronjob command to:

command:
 - bash
 - -c
 - >-
   kubectl rollout restart deployment/<YOUR DEPLOYMENT NAME> &&
   kubectl rollout status deployment/<YOUR DEPLOYMENT NAME>
OhJeez
  • 2,774
  • 2
  • 14
  • 18
  • 11
    While this doesn't technically answer the question asked, this is (IMO) by far the best option for periodic restarts of a cluster's pods! – cyberconte Feb 19 '20 at 19:51
  • 8
    This answer saved our live and helped us to overcome a huge incident and money loss while investigating and fixing the root cause. Thank YOU! – Ahmed Ayoub Mar 06 '20 at 17:28
  • 4
    From https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs. **Caution**: All CronJob schedule: times are based on the timezone of the kube-controller-manager. If your control plane runs the kube-controller-manager in Pods or bare containers, the timezone set for the kube-controller-manager container determines the timezone that the cron job controller uses. – Ricardo Cardona Ramirez Aug 13 '20 at 20:17
  • @OhJeez Can I restart just the oldest pod (not all) if I have 30 pods running for a deployment using this? – user312307 Sep 22 '20 at 13:30
  • 6
    A counterpoint: don't host resources with the ability to self-modify a k8s cluster, isolate those tasks in a separate cronjob tool, external to the cluster. Ideally, this tool should be the one offered by the service provider managing your cluster -- e.g. https://cloud.google.com/scheduler for GKE, or in the case of AWS, a second ECS cluster for running highly sensitive jobs in other ECS clusters – yurisich Oct 16 '20 at 18:52
  • 2
    Notice however that changes to `ServiceAccount` require elevated priviledges (cluster admin), while `livenessProbe` approach can be used by ordinary devs... – mirekphd Feb 16 '21 at 19:20
  • Hey @OhJeez, I have the same use case want to restart two k8s deployments which run in separate clusters, What I am not able to understand is where do I need to add the RBAC YAML ? Inside the service i want to restart or the k8s job which will trigger restarts? – Nikhil Verma Sep 23 '21 at 09:51
  • 1
    Update: in CronJob change `batch/v1beta1` to `batch/v1` to make it work – chill appreciator Aug 24 '23 at 08:22
39

Another quick and dirty option for a pod that has a restart policy of Always (which cron jobs are not supposed to handle - see creating a cron job spec pod template) is a livenessProbe that simply tests the time and restarts the pod on a specified schedule

ex. After startup, wait an hour, then check hour every minute, if hour is 3(AM) fail probe and restart, otherwise pass

livenessProbe:
  exec:
    command:
    - exit $(test $(date +%H) -eq 3 && echo 1 || echo 0)
  failureThreshold: 1
  initialDelaySeconds: 3600
  periodSeconds: 60

Time granularity is up to how you return the date and test ;)

Of course this does not work if you are already utilizing the liveness probe as an actual liveness probe ¯\_(ツ)_/¯

Ryan Lowe
  • 399
  • 4
  • 4
  • Maybe should add the minutes in the command, otherwise it will always auto restart at 3:00~3:59. `- exit $(test $(date +%H%M) -eq 0300 && echo 1 || echo 0)` – zzm Apr 12 '19 at 08:19
  • 3
    Surely this approach would cause it to continually restart during the specified period, like a whole minute. Using precision to the second makes it potentially possible to miss the check all together. Maybe checking uptime if the update is greater than 24 hours would be simpler and more appropriate? – Philluminati Apr 18 '19 at 12:37
  • 3
    This approach does avoid restart storms by waiting an hour after startup to begin the probe again (initialDelaySeconds) so anywhere between 3:00 and 3:01 it fails, and then once it restarts it waits an hour to start checking time again (with startup time for a fairly large vert.x app ~ 25 seconds, first probes start between 4:01 and 4:02) – Ryan Lowe Apr 19 '19 at 02:27
  • @Philluminati your uptime suggestion is a great one for a regimented 24 hour restart, but if you were trying to schedule the restart for a certain time you would need to time your initial startup to match – Ryan Lowe Apr 19 '19 at 02:38
  • 9
    The above liveness command cannot be written on one line this way. You can use `- bash`, `- -c`, and `- exit $(test $(date +%H) -eq 3 && echo 1 || echo 0)` on three separate lines though. – Masood Khaari Jun 25 '19 at 10:39
  • 1
    @MassoodKhaari you are correct, since the test is run in the pod's Docker container, the date / test / exit commands are entirely dependent on the container's shell – Ryan Lowe Jul 30 '19 at 19:51
  • @RyanLowe, while applied this code, It will automatically restart each time after "initialeDelayseconds" occurred. it is not validate - exit $(test $(date +%H%M) -eq 0300 && echo 1 || echo 0) condition. can you please help me to make it correct? – Jatin Patel - JP Sep 13 '19 at 12:31
  • 2
    This approach has some downtime. After liveness probe fails and before the container is restarted, pod cannot accept traffic. If all containers happen to restart at the same time, there will be service interruption. – OhJeez Oct 14 '19 at 13:06
  • 1
    @OhJeez that is absolutely true, the livenessProbe will cause downtime as all pods from that deployment will restart simultaneously, the cronjob you describe below should be the accepted answer for production :) – Ryan Lowe Jan 24 '20 at 16:02
  • Thank you. I needed a temporary hack because on production my app is having an issue that needs restart each 24 hours. I fixed the code long ago but company policy for deployment is veeeery slow. Thanks again. :) – gyorgyabraham Feb 05 '21 at 09:22
  • Use `sh` not `bash` to be certain it is available everywhere (e.g. in Alpine):`- sh` | `- '-c'`|`- exit $(test $(date +%H) -eq 3 && echo 1 || echo 0)` – mirekphd Feb 16 '21 at 19:59
  • Is there a reason why we wouldn't simplify the bash command to just `exit 1` and then extend the `initialDelaySeconds` to be the frequency we want to restart on? Does it fire more often than expected? – War Gravy Mar 11 '23 at 00:38
  • the simpliest way would be to check every hour rather than every minute.. else it will restart 60 times when its 3am as its doing it every minute. if its checking every 3600 seconds, then it should restart only once per day. – mirageglobe Aug 01 '23 at 14:23
11

I borrowed idea from @Ryan Lowe but modified it a bit. It will restart pod older than 24 hours

      livenessProbe:
        exec:
          command:
             - bin/sh
            - -c
            - "end=$(date -u +%s);start=$(stat -c %Z /proc/1 | awk '{print int($1)}'); test $(($end-$start)) -lt 86400"
Dmitry
  • 119
  • 1
  • 4
  • 3
    `/proc/1` is not a reliable source of information. The timestamp can be very different from reality. I would use `ps -p 1 -o etimes --no-headers` when available and when the process ID is known ("1" in my case). – kivagant Jun 16 '20 at 21:35
5

There's a specific resource for that: CronJob

Here an example:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: your-cron
spec:
  schedule: "*/20 8-19 * * 1-5"
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      template:
        metadata:
          labels:
            app: your-periodic-batch-job
        spec:
          containers:
          - name: my-image
            image: your-image
            imagePullPolicy: IfNotPresent
          restartPolicy: OnFailure

change spec.concurrencyPolicy to Replace if you want to replace the old pod when starting a new pod. Using Forbid, the new pod creation will be skip if the old pod is still running.

Nicola Ben
  • 10,615
  • 8
  • 41
  • 65
  • 7
    It is not clear to me how this works. Does it deploy a new pod and thus Kubernetes automatically remove one of the old pods? – span Apr 16 '19 at 06:52
  • the implication is that the command in `your-image` does something to trigger the pod to restart. – Stuart Harland May 03 '22 at 10:15
  • @StuartHarland Are you sure? My interpretation is that `your-image` is your service, and it runs until the next time the cron job fires, when `Replace` stops it and starts a new instance. Of course it would work the way you suggest too, but it seems unnecessarily complicated. – tgdavies Jun 27 '22 at 01:09
2

According to cronjob-in-kubernetes-to-restart-delete-the-pod-in-a-deployment you could create a kind: CronJob with a jobTemplate having containers. So your CronJob will start those containers with a activeDeadlineSeconds of one day (until restart). According to you example, it will be then schedule: 0 8 * * ? for 8:00AM

Andre Albert
  • 1,386
  • 8
  • 17
1

We were able to do this by modifying the manifest (passing a random param every 3 hours) file of deployment from a CRON job:

We specifically used Spinnaker for triggering deployments:

We created a CRON job in Spinnaker like below:

Configuration step looks like: enter image description here

The Patch Manifest looks like: (K8S restarts PODS when YAML changes, to counter that check bottom of post) enter image description here

As there can be a case where all pods can restart at a same time, causing downtime, we have a policy for Rolling Restart where maxUnavailablePods is 0%

 spec:
  # replicas: 1
     strategy:
      type: RollingUpdate
       rollingUpdate:
        maxSurge: 50%
          maxUnavailable: 0%

This spawns new pods and then terminates old ones.

Nikhil Verma
  • 1,777
  • 1
  • 20
  • 41
1
livenessProbe:
  exec:
    command:
    - bash
    - -c
    - "exit 1"
  failureThreshold: 1
  periodSeconds: 86400

where 86400 is a desired period in seconds (1 restart per day in this example)

Serg046
  • 1,043
  • 1
  • 13
  • 42