Questions tagged [google-cloud-composer]

Google Cloud Composer is a fully managed workflow orchestration service, built on Apache Airflow, that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers.

Cloud Composer is a product of Google Cloud Platform (GCP). It is essentially "hosted/managed Apache Airflow."

The product allows you to create, schedule, and monitor jobs, each job being represented as a DAG (directed acyclic graph) of various operators. You can use Airflow's built-in operator definitions and/or or define your own in pure Python.

While technically you can do data processing directly within a task (instantiated operator), more often you will want a task to invoke some sort of processing in another system (which could be anything - a container, BigQuery, Spark, etc). Often you will then wait for that processing to complete using an Airflow sensor operator, possibly launch further dependent tasks, etc.

While Cloud Composer is managed, you can apply a variety of customizations, such as specifying which pip modules to install, hardware configurations, environment variables, etc. Cloud Composer allows overriding some but not all Airflow configuration settings.

Further technical details: Cloud Composer will create a Kubernetes cluster for each Airflow environment you create. This is where your tasks will be run, but you don't have to manage it. You will place your code within a specific bucket in Cloud Storage, and Cloud Composer will sync it from there.

1225 questions
65
votes
3 answers

How to control the parallelism or concurrency of an Airflow installation?

In some of my Apache Airflow installations, DAGs or tasks that are scheduled to run do not run even when the scheduler doesn't appear to be fully loaded. How can I increase the number of DAGs or tasks that can run concurrently? Similarly, if my…
hexacyanide
  • 88,222
  • 31
  • 159
  • 162
20
votes
4 answers

Using Dataflow vs. Cloud Composer

I'd like to get some clarification on whether Cloud Dataflow or Cloud Composer is the right tool for the job, and I wasn't clear from the Google Documentation. Currently, I'm using Cloud Dataflow to read a non-standard csv file -- do some basic…
user10503628
19
votes
10 answers

Cloud composer: "PERMISSION_DENIED: The caller does not have permission"

I implemented a few tasks with BashOperator. Ones with "gsutil rm" and "gsutil cp" worked fine. But one with "gcloud alpha firestore export" generates this error: {bash_operator.py:101} INFO - ERROR: (gcloud.alpha.firestore.export)…
kee
  • 10,969
  • 24
  • 107
  • 168
13
votes
5 answers

Airflow DAG fails when PythonOperator with error "Negsignal.SIGKILL"

I am running Airflowv1.10.15 on Cloud Composer v1.16.16. My DAG looks like this : from datetime import datetime, timedelta # imports from airflow import DAG from airflow.operators.python_operator import PythonOperator from…
13
votes
2 answers

Confused about Airflow's BaseSensorOperator parameters : timeout, poke_interval and mode

I have a bit of confusion about the way BaseSensorOperator's parameters work: timeout & poke_interval. Consider this usage of the sensor : BaseSensorOperator( soft_fail=True, poke_interval = 4*60*60, # Poke every 4 hours timeout = 12*60*60, …
Imad
  • 2,358
  • 5
  • 26
  • 55
13
votes
2 answers

How to manage Python dependencies in Airflow?

On my local machine I created a virtualenv and installed Airflow. When a dag or plugin requires a python library I pip install it into the same virtualenv. How can I keep track of which libraries belong to a dag, and which are used for airflow…
Srule
  • 430
  • 1
  • 4
  • 10
11
votes
1 answer

DAGs not clickable on Google Cloud Composer webserver, but working fine on a local Airflow

I'm using Google Cloud Composer (managed Airflow on Google Cloud Platform) with image version composer-0.5.3-airflow-1.9.0 and Python 2.7, and I'm facing a weird issue : after importing my DAGs, they are not clickable from the Web UI (and there are…
norbjd
  • 10,166
  • 4
  • 45
  • 80
10
votes
2 answers

Cloud Composer vs Cloud Scheduler

I am currently studying for the GCP Data Engineer exam and have struggled to understand when to use Cloud Scheduler and whe to use Cloud Composer. From reading the docs, I have the impression that Cloud Composer should be used when there is…
10
votes
6 answers

Google Cloud Composer and Google Cloud SQL

What ways do we have available to connect to a Google Cloud SQL (MySQL) instance from the newly introduced Google Cloud Composer? The intention is to get data from a Cloud SQL instance into BigQuery (perhaps with an intermediary step through Cloud…
9
votes
1 answer

How does a Dataproc Spark operator return a value and how to capture and return it

How does a Dataproc Spark operator in Airflow return a value and how to capture it. I have a downstream job which capture this result and based on returned value, I've to trigger another job by branch operator.
9
votes
1 answer

How to use connection hooks with `KubernetesPodOperator` as environment variables on Apache Airflow on GCP Cloud Composer

I'd like to use connections saved in airflow in a task which uses the KubernetesPodOperator. When developing the image I've used environment variables to pass database connection information down to the container, but the production environment has…
9
votes
5 answers

Broken DAG: (...) No module named docker

I have BigQuery connectors all running, but I have some existing scripts in Docker containers I wish to schedule on Cloud Composer instead of App Engine Flexible. I have the below script that seems to follow the examples I can find: import…
MarkeD
  • 2,500
  • 2
  • 21
  • 35
8
votes
2 answers

Google Cloud Composer(Airflow) - dataflow job inside a DAG executes successfully, but the DAG fails

My DAG looks like this default_args = { 'start_date': airflow.utils.dates.days_ago(0), 'retries': 0, 'dataflow_default_options': { 'project': 'test', 'tempLocation': 'gs://test/dataflow/pipelines/temp/', …
8
votes
3 answers

No module named airfow.gcp - how to run dataflow job that uses python3/beam 2.15?

When I go to use operators/hooks like the BigQueryHook I see a message that these operators are deprecated and to use the airflow.gcp... operator version. However when i try and use it in my dag it fails and says no module named airflow.gcp. I have…
WIT
  • 1,043
  • 2
  • 15
  • 32
8
votes
2 answers

Retry Airflow task instance only on certain Exception

What's the best way to retry an Airflow operator only for certain failures/exceptions? For example, let's assume that I have an Airflow task which relies on the availability of an external service. If this service becomes unavailable during the task…
tsabsch
  • 2,131
  • 1
  • 20
  • 28
1
2 3
81 82