0

I'm running Apache Airflow on a Kubernetes cluster and I'm facing an issue with retrieving logs from deleted pods. I can easily get logs from currently running or recently terminated pods using kubectl logs, but I'm unable to get logs from older, deleted pods.

My versions

awswrangler==2.19.0
apache-airflow-providers-cncf-kubernetes==4.3.0
apache-airflow-providers-amazon==5.0.0
boto3==1.24.56
gnupg==2.3.1
PyYAML==6.0

Here's what I've tried:

# This works fine and gives me the actual log
kubectl logs my-current-pod-239847283947

# This doesn't work for an old, deleted pod
kubectl logs my-old-pod-928374928374
The second command returns the following error:

The error

Error from server (NotFound): pods "my-old-pod" not found

I understand that the pod is deleted, but is there a way to retrieve its logs or at least configure Airflow or Kubernetes to save these logs for future reference?

NOTE: I'm using AWS for storage

Any help would be greatly appreciated!

The Dan
  • 1,408
  • 6
  • 16
  • 41
  • You can set to keep the pod. Its one of the parameters of `KubernetesPodOperator` – Elad Kalif Sep 01 '23 at 16:31
  • What is the name of the parameter? From what version of airflow it exists? – The Dan Sep 01 '23 at 16:48
  • Are you using `KubernetesExecutor` or different executor with `KuberntesPodOperator`? – Elad Kalif Sep 01 '23 at 16:53
  • Kubernetes pod operator, I found this https://github.com/apache/airflow/blob/332d584f3e270b8cf98384cf126e0d6df1e91b68/airflow/providers/cncf/kubernetes/operators/pod.py#L240 – The Dan Sep 01 '23 at 17:29
  • The issue with the on_finish_action="keep_pod" is that I don't fully understand (1) for how long will it keep the pods, (2) what pods is it keeping? all of them? What if I only want to keep failed pods? – The Dan Sep 01 '23 at 17:31

1 Answers1

1

You might be able to get the logs by following How to see logs of terminated pods

However it's much easier to handle this by simply keep the pod you need.

Airflow offers the option to delete the pod when the task is finished you simply need to disable the deletion.

For apache-airflow-providers-cncf-kubernetes>=7.2.0:

KubernetesPodOperator(..., on_finish_action='keep_pod')

NOTE: If you don't mind about logs of successful tasks you can also set to on_finish_action='delete_succeeded_pod' which will delete only successful pods thus leaving the errored ones for further investigation. This offers much more flexibility than older versions of the provider (See PR).

For apache-airflow-providers-cncf-kubernetes<7.2.0:

KubernetesPodOperator(..., is_delete_operator_pod=False)

If you no longer need the pods you can delete them with kubectl. You can set also script (to be executed with Airflow) that will clean older pods (for example delete all pods older than X days).

Elad Kalif
  • 14,110
  • 2
  • 17
  • 49
  • Awesome! Thanks, I'll check it out and let you know if it's working fine. It's cool that you are collaborating in the development of Airflow – The Dan Sep 01 '23 at 23:13