13

I am not able see the logs attached to the tasks from the Airflow UI: enter image description here

Log related settings in airflow.cfg file are:

  • remote_base_log_folder =
  • base_log_folder = /home/my_projects/ksaprice_project/airflow/logs
  • worker_log_server_port = 8793
  • child_process_log_directory = /home/my_projects/ksaprice_project/airflow/logs/scheduler

Although I am setting remote_base_log_folter it is trying to fetch the log from http://:8793/log/tutorial/print_date/2017-08-02T00:00:00 - I don't understand this behavior. According to the settings the workers should store the logs at /home/my_projects/ksaprice_project/airflow/logs and they should be fetched from the same location instead of remote.

Update task_instance table content: enter image description here

Javed
  • 5,904
  • 4
  • 46
  • 71
  • what mode are you running the airflow in - Local, Celery? Try checking out the following URL as there is an elaborate discussion on the topic there https://github.com/puckel/docker-airflow/issues/44 – Saurabh Mishra Aug 03 '17 at 10:17
  • using CeleryExecutor – Javed Aug 03 '17 at 10:52
  • 1
    could you check in the DB configured - table - task_instance . This table has column named 'hostname' from where the log URL is built and sourced. Ideally this value is same as what you get on running 'hostname' command on your worker node. – Saurabh Mishra Aug 03 '17 at 11:34
  • hostname column is empty string: '' – Javed Aug 03 '17 at 11:43
  • i see most of your task instances are in queued state hence having hostname empty is reasonable. Did the only 'success' task instance give you the desired output? Can you try running some basic operations like BashOperator and see if they are received by worker instance? – Saurabh Mishra Aug 03 '17 at 12:05
  • One more problem that facing is the scheduler is pushing in the queue defined in airflow.cfg file but the worker listening to some other queue for some reason - my tasks are not getting executed by worker. I am using rabbitmq broker. When I am checking in rabbitmq UI a queue with name:celeryev.2708e0df-7957-4e63-add9-b11beaabe6eb is getting generated automatically and the worker is listening on this queue - even if I do:`airflow worker -p celeryev.2708e0df-7957-4e63-add9-b11beaabe6eb` – Javed Aug 03 '17 at 12:13
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/150954/discussion-between-javed-and-saurabh-mishra). – Javed Aug 03 '17 at 12:15

3 Answers3

9

I also faced the same problem.

Setting below variables in airflow.cfg worked for me. Use {hostname} as machine's FQDN {hostname} instead of localhost.

endpoint_url = http://{hostname}:8080

base_url = http://{hostname}:8080

Best of luck!

kgangadhar
  • 4,886
  • 5
  • 36
  • 54
Nilesh Gavali
  • 171
  • 2
  • 6
  • 1
    the base_url is certainly important, many of the pages in the UI use it to build links dynamically. The endpoint_url appears to be used by the cli only, so I doubt it helps with this issue. – Davos Dec 12 '17 at 00:41
  • See https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L984 for example where the log filepath is generated, and the following method log_url which uses the base_url config value. – Davos Dec 12 '17 at 01:08
  • Thanks! This fixed the issue for me. – jj080808 Oct 15 '18 at 22:17
  • I am not able to open the URL. Can you elaborate the fix ? – shiv Sep 20 '19 at 09:46
1

As you can see in the image-1 there is a timestamp , make sure in your logs you have the folder/file with that timestamp as name ..

You are looking at UI, so first make sure you have log files created in the directory, in my case my log folder looks like

(AIRFLOW-ENV) [cloudera@quickstart dags]$ ll /home/cloudera/workspace/python/airflow_home/logs/my_test_dag/my_sensor_task 
total 8
-rw-rw-rw- 1 cloudera cloudera 3215 Nov 14 08:45 2017-11-12T12:00:00
-rw-rw-rw- 1 cloudera cloudera 2694 Nov 14 08:45 2017-11-14T08:36:06.920727
(AIRFLOW-ENV) [cloudera@quickstart dags]$ 

So my log URL is

http://localhost:8080/admin/airflow/log?task_id=my_sensor_task&dag_id=my_test_dag&execution_date=2017-11-14T08:36:06.920727

When you go to your DAG, and select the GRAPH-VIEW, you can see a dropdown next to "RUN", select the appropriate run, and then in the graph-view below , select the appropriate task/operator and select view-log

Dave
  • 962
  • 5
  • 19
  • 44
  • I think I have this problem. But I dont know why the log file with correct timestamp is not generated. Anything obvious that must be happening? – Rohit Negi Dec 22 '19 at 06:02
0

I ran into this as well, and had to unpause the tasks.

dags_are_paused_at_creation = False

I also set new dags to default to unpaused in my airflow.cfg

dags_are_paused_at_creation = False