1

Below is the airflow DAG code. It runs perfectly both when airflow is hosted locally, and on cloud composer. However, the DAG itself isn't clickable in the Composer UI. I found a similar question and tried the accepted answer as linked in this question. My problem is similar.

import airflow
from airflow import DAG

from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator
from airflow.operators.mysql_operator import MySqlOperator
from airflow.contrib.operators.dataproc_operator import DataprocClusterCreateOperator
from airflow.contrib.operators.dataproc_operator import DataprocClusterDeleteOperator
from airflow.contrib.operators.dataproc_operator import DataProcSparkOperator

from datetime import datetime, timedelta
import sys

#copy this package to dag directory in GCP composer bucket
from schemas.schemaValidator import loadSchema
from schemas.schemaValidator import sparkArgListToMap

#change these paths to point to GCP Composer data directory

## cluster config
clusterConfig= loadSchema("somePath/jobConfig/cluster.yaml","cluster")

##per job yaml config
autoLoanCsvToParquetConfig= loadSchema("somePath/jobConfig/job.yaml","job")

default_args= {
        'owner': 'airflow',
        'depends_on_past': False,
        'start_date': datetime(2019, 1, 1),
        'retries': 1,
        'retry_delay': timedelta(minutes=3)
}

dag= DAG('usr_job', default_args=default_args, schedule_interval=None)

t1= DummyOperator(task_id= "start", dag=dag)

t2= DataprocClusterCreateOperator(
    task_id= "CreateCluster",
    cluster_name= clusterConfig["cluster"]["cluster_name"],
    project_id= clusterConfig["project_id"],
    num_workers= clusterConfig["cluster"]["worker_config"]["num_instances"],
    image_version= clusterConfig["cluster"]["dataproc_img"],
    master_machine_type= clusterConfig["cluster"]["worker_config"]["machine_type"],
    worker_machine_type= clusterConfig["cluster"]["worker_config"]["machine_type"],
    zone= clusterConfig["region"],
    dag=dag
)

t3= DataProcSparkOperator(
    task_id= "csvToParquet",
    main_class= autoLoanCsvToParquetConfig["job"]["main_class"],
    arguments= autoLoanCsvToParquetConfig["job"]["args"],
    cluster_name= clusterConfig["cluster"]["cluster_name"],
    dataproc_spark_jars= autoLoanCsvToParquetConfig["job"]["jarPath"],
    dataproc_spark_properties= sparkArgListToMap(autoLoanCsvToParquetConfig["spark_params"]),
    dag=dag
)

t4= DataprocClusterDeleteOperator(
    task_id= "deleteCluster",
    cluster_name= clusterConfig["cluster"]["cluster_name"],
    project_id= clusterConfig["project_id"],
    dag= dag
)

t5= DummyOperator(task_id= "stop", dag=dag)

t1>>t2>>t3>>t4>>t5

The UI gives this error - "This DAG isn't available in the webserver DAG bag object. It shows up in this list because the scheduler marked it as active in the metadata database."

And yet, when I triggered the DAG manually on Composer, I found it ran successfully through the log files.

sudeepgupta90
  • 753
  • 1
  • 9
  • 17
  • It's possible that the DAG loads fine in your cluster, but not in the web server project (which is different than your main project). Do you have web server logs from Stackdriver you can post? – hexacyanide Mar 31 '19 at 03:37
  • I checked on the logs like you suggested, and nothing seemed out of order. No errors/warnings of any kind on both web server and scheduler. And as mentioned already the dag executes perfectly without a hitch. It's just that as linked in the question above, the dag is not clickable, and I can't see any dag metrics in the webserver UI – sudeepgupta90 Apr 01 '19 at 09:40
  • Can you check your web server logs and ensure that the GCS sync is working from the bucket to the web server? Has this happened with any other DAGs? What IAM roles does your tenant project user have? (the `*-tp@appspot.gserviceaccount.com` user that matches your web server URL) – hexacyanide Apr 01 '19 at 15:34
  • The GCS sync is working yes (otherwise, the dag wouldn't have executed. Plus I can see the dag in the UI, just it's not clickable.) I have created the *-tp@appspot.gserviceaccount.com service account, and assigned it owner role for simplicity's sake. I tried another DAG, and it seems, that the problem is with the particular DAG I shared, yet there are no breakages, error/warn logs anywhere – sudeepgupta90 Apr 02 '19 at 10:17

1 Answers1

0

The issue was with the path which was being provided for picking up the configuration files. I was giving path for the data folder in GCS. As per Google documentation, only dags folder is synced to all nodes, and not the data folder.

Needless to say, it was a issue encountered during dag parsing time, hence, it did not appear correctly on the UI. More interestingly, these debug messages were not exposed to Composer 1.5 and earlier. Now they are available to the end user to help in debugging. Thanks anyway to everyone who helped.

sudeepgupta90
  • 753
  • 1
  • 9
  • 17