5

Composer is failing a task due to it not being able to read a log file, it's complaining about incorrect encoding.

Here's the log that appears in the UI:

*** Unable to read remote log from gs://bucket/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** 'ascii' codec can't decode byte 0xc2 in position 6986: ordinal not in range(128)

*** Log file does not exist: /home/airflow/gcs/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Fetching from: http://airflow-worker-68dc66c9db-x945n:8793/log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-68dc66c9db-x945n', port=8793): Max retries exceeded with url: /log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1c9ff19d10>: Failed to establish a new connection: [Errno -2] Name or service not known',))

I try viewing the file in the google cloud console and it also throws an error:

Failed to load

Tracking Number: 8075820889980640204

But I am able to download the file via gsutil.

When I view the file, it seems to have text overriding other text.

I can't show the entire file but it looks like this:

--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------
@-@{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,313] {models.py:1569} INFO - Executing <Task(BigQueryOperator): merge_campaign_exceptions> on 2019-08-03T10:00:00+00:00@-@{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,314] {base_task_runner.py:124} INFO - Running: ['bash', '-c', u'airflow run __campaign_exceptions_0_0_1 merge_campaign_exceptions 2019-08-03T10:00:00+00:00 --job_id 22767 --pool _bq_pool --raw -sd DAGS_FOLDER//-campaign-exceptions.py --cfg_path /tmp/tmpyBIVgT']@-@{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:24,658] {base_task_runner.py:107} INFO - Job 22767: Subtask merge_campaign_exceptions [2019-08-04 10:01:24,658] {settings.py:176} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800@-@{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}

Where the @-@{} pieces seems to be "on top of" the typical log.

Andres Lowrie
  • 170
  • 3
  • 10
  • Are you sure the task fails because of the inability to load logs, or perhaps the other way around? It seems more likely that a task wrote logs with non-ASCII or binary content, which is preventing the web UI from showing them. That itself shouldn't have any effect on whether or not the task is able to finish. – hexacyanide Aug 25 '19 at 01:34
  • Not sure tbh and I have no way of checking it, but what you're saying makes sense. Funny enough however the appearance of the `@-@{}` is still happening in the actual log files in GCS while not appearing in the UI... even on successful tasks. Not sure if this is a GCP bug or an Airflow one; I'm going to dive into their Jira to see if anyone else is seeing this in airflow (not composer) – Andres Lowrie Sep 27 '19 at 13:07

4 Answers4

3

I faced the same problem. In my case the problem was that I removed the google_gcloud_default connection that was being used to retrieve the logs.

Check the configuration and look for the connection name.

[core]
remote_log_conn_id = google_cloud_default

Then check the credentials used for that connection name has the right permissions to access the GCS bucket.

frantracer
  • 46
  • 5
0

I'm having a similar problem with viewing logs in GCP Cloud Composer. It doesn't appear to be preventing the failing DAG task from running though. What it looks like is a permissions error between the GKE and Storage Bucket where the log files are kept.

You can still view the logs by going into your cluster's storage bucket in the same directory as your /dags folder where you should also see a logs/ folder.

Michael
  • 21
  • 3
0

Your helm chart should setup global env:

- name: AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT
value: "google-cloud-platform://"

Then, you should deploy a Dockerfile with root account only (not airflow account), additionaly, you set up your helm uid, gid as:

uid: 50000 #airflow user
gid: 50000 #airflow group

Then upgrade helm chart with new config

0

*** Unable to read remote log from gs://bucket

1)Found the solution after assigning the roles to the service account 2)The SA key(json or txt) to be added and configured to the connection in the

remote_log_conn_id = google_cloud_default

3)restart the scheduler and webserver of the airflow

4)restart the dags on the airflow

you can find the logs on the GCS bucket where its configured