2

This is driving me nuts.

I'm setting up airflow in a cloud environment. I have one server running the scheduler and the webserver and one server as a celery worker, and I'm using airflow 1.8.0.

Running jobs works fine. What refuses to work is logging.

I've set up the correct path in airflow.cfg on both servers:

remote_base_log_folder = s3://my-bucket/airflow_logs/

remote_log_conn_id = s3_logging_conn

I've set up s3_logging_conn in the airflow UI, with the access key and the secret key as described here.

I checked the connection using

s3 = airflow.hooks.S3Hook('s3_logging_conn')

s3.load_string('test','test',bucket_name='my-bucket')

This works on both servers. So the connection is properly set up. Yet all I get whenever I run a task is

*** Log file isn't local.

*** Fetching here: http://*******

*** Failed to fetch log file from worker.

*** Reading remote logs...

Could not read logs from s3://my-bucket/airflow_logs/my-dag/my-task/2018-02-15T21:46:47.577537

I tried manually uploading the log following the expected conventions and the webserver still can't pick it up - so the problem is on both ends. I'm at a loss at what to do, everything I've read so far tells me this should be working. I'm close to just installing the 1.9.0 which I hear changes logging and see if I'm more lucky.

UPDATE: I made a clean install of Airflow 1.9 and followed the specific instructions here.

Webserver won't even start now with the following error:

airflow.exceptions.AirflowConfigException: section/key [core/remote_logging] not found in config

There is an explicit reference to this section in this config template.

So I tried removing it and just loading the S3 handler without checking first and I got the following error message instead:

Unable to load the config, contains a configuration error.

Traceback (most recent call last):

File "/usr/lib64/python3.6/logging/config.py", line 384, in resolve:

self.importer(used)

ModuleNotFoundError: No module named

'airflow.utils.log.logging_mixin.RedirectStdHandler';

'airflow.utils.log.logging_mixin' is not a package

I get the feeling that this shouldn't be this hard.

Any help would be much appreciated, cheers

pedrogfp
  • 549
  • 1
  • 6
  • 13
  • I have now reinstalled everything, generated new credentials *and* upgraded to Airflow 1.9 and the problem persists. – pedrogfp Feb 16 '18 at 00:03
  • Please update the logs with the errors from Airflow 1.9, it should work and some users are actually using this in production. – Fokko Driesprong Feb 19 '18 at 15:44
  • Done, added the new errors. – pedrogfp Feb 19 '18 at 17:50
  • Just a side note, the current template incubator-airflow/airflow/config_templates/airflow_local_settings.py present in master branch contains a reference to the class "airflow.utils.log.s3_task_handler.S3TaskHandler", which is not present in apache-airflow==1.9.0 python package. The fix is simple - use rather this base template: https://github.com/apache/incubator-airflow/blob/v1-9-stable/airflow/config_templates/airflow_local_settings.py After having done that, follow all other instructions in the [mentioned answer](https://stackoverflow.com/a/48194903/9979495). Note that this tweak regards s – diogoa Jun 22 '18 at 17:12

1 Answers1

2

Solved:

  1. upgraded to 1.9
  2. ran the steps described in this comment
  3. added

    [core]

    remote_logging = True

    to airflow.cfg

  4. ran

    pip install --upgrade airflow[log]

Everything's working fine now.

pedrogfp
  • 549
  • 1
  • 6
  • 13