-
CrashLoopBackOff
indicates that a container is repeatedly crashing after restarting. A container might crash for many reasons, and checking a Pod's logs might aid in troubleshooting the root cause.
Apart from the error text message Does not have minimum availability
, there could be other error text messages such as Failed to pull image
. However, I recommend you to identify error text messages which are appropriate for your environment. You can check with kubectl logs <pod_name>
or on Log Viewer.
For your reference, here are explanations for pod issues:
- CrashLoopBackOff means the container was downloaded but failed to run
- ImagePullBackOff means the image was not downloaded
- "Does not have minimum availability" means that there are no resources available on cluster but not specific to a lack of resources. For instance there maybe nodes available but the pod is not scheduleable on them per the deployment.
- "Insufficient cpu" means there is insufficient cpu on the nodes.
- "Unschedulable" indicates that your Pod cannot be scheduled because of insufficient resources or some configuration error.
With that in mind, Here is the step-by-step for creating a Log based Metric for later creating an alert based on it.
Setup a Logs-based Metric using the parameters:
resource.type="k8s_pod"
severity>=WARNING
unschedulable
You can replace the filter to something that is more appropriate for your case.
Create a label in the metric that will allow you to identify the pod that was unschedulable
(or other status). This will also help with grouping when you create the alert for a failing pod.
In Stackdriver Monitoring, create an alert with the following parameters.
- Set the resource type to
k8s_pod
- Set the metric to the one you created in step 1
- Set
Group By
to the pod_name
(also created in step 1)
- In the advanced aggregation section set the aligner to
sum
and the Alignment Period to 5m
(or what you thinks is more appropriate).
- Configure the condition triggers
For
to more than 1 minute to prevent the alert from firing over and over. This can also be configured per your requirement.
I hope this information is helpful, If you have any questions let me know in the comments.