Conditionally execute multiple branches one by one

Question

Note

Please read and understand the question thoroughly
It cannot be solved by simple BranchPythonOperator / ShortCircuitOperator

We have an unusual multiplexer-like use-case in our workflow

                                +-----------------------+
                                |                       |
                  +------------>+  branch-1.begin-task  |
                  |             |                       |
                  |             +-----------------------+
                  |
                  |
                  |             +-----------------------+
                  |             |                       |
                  +------------>+  branch-2.begin-task  |
                  |             |                       |
+------------+    |             +-----------------------+
|            |    |
|  MUX-task  +----+                         +
|            |    |                         |
+------------+    |
                  |                         |
                  +- -- -- -- ->
                  |                         |
                  |
                  |                         |
                  |                         +
                  |
                  |             +-----------------------+
                  |             |                       |
                  +------------>+  branch-n.begin-task  |
                                |                       |
                                +-----------------------+

The flow is expected to work as follows

MUX-task listens for events on an external queue (single queue)
each event on queue triggers execution of one of the branches (branch-n.begin-task)
one-by-one, as events arrive, the MUX-task must trigger execution of respective branch
once all branches have been triggered, the MUX-task completes

Assumptions

Exactly n events arrive on queue, one for triggering each branch
n is dynamically-known: it's value is defined in a Variable

Limitations

The external queue where events arrive is only one
we can't have n queues (one per branch) since branches grow with time (n is dynamically defined)

We are not able to come up with a solution within Airflow's set of operators and sensors (or any such thing available out-of-the-hood in Airflow) to build this

Sensors can be used for listening events on external queue; but we have to listen for multiple events, not one
BranchPythonOperator can be used to trigger execution of a single branch out of many, but it immediately marks remaining branches as skipped

Primary bottleneck

Because of the 2nd limitation above, even a custom-operator combining functionality of a Sensor and BranchPythonOperator won't work.

We have tried to brainstorm around a fancy combination of Sensors, DummyOperator and trigger_rules to achieve this, but have had no success thus far.

Is this doable in Airflow?

UPDATE-1

Here's some background info to understand the context of workflow

we have an ETL pipeline to sync MySQL tables (across multiple Aurora databases) to our data-lake
to overcome the impact of our sync pipeline on production databases, we have decided to do this
- for each database, create a snapshot (restore AuroraDB cluster from last backup)
- run MySQL sync pipeline using that snapshot
- at then end of sync, terminate the snapshot (AuroraDB cluster)
the snapshot lifecycle events of Aurora snapshot restore process are published to an SQS queue
- single queue for all databases
- this setup was done by our DevOps team (different AWS account, we don't have access to the underlying Lambdas / SQS / infra)

score 0 · Answer 1 · answered Apr 19 '20 at 12:35

XCOMs to the rescue!

We decided to model the tasks as follows (both tasks are custom operators)

The MUX-task is more like an iterative-sensor: it keeps listening for events on queue and takes some action against each event arriving on queue
All branch-x.begin-tasks are simple sensors: they listen for publishing of an XCOM (who's name is in a pre-defined specific format)

The workflow runs as follows

The MUX-task listens for events on queue (listening part is enclosed in a for-loop with as many iterations as the number of branches)
When an event arrives, the MUX-task picks it up; it identifies which 'branch' should be triggered and publishes an XCOM for the respective branch
The respective branch's sensor picks up that XCOM on it's next poke and the branch starts executing. In effect, branch's sensor merely acts as a gateway that opens up with an external event (XCOM) and allows execution of branch

Since there are too many sensors (one per branch), we would most likely be employing mode='reschedule' to overcome deadlocks

Since the described approach relies heavily on polling, we don't deem it to be super efficient.
A reactive triggering based approach would be more desirable, but we haven't been able to work it out

UPDATE-1

Looks like 'reactive' approach is achievable if we could model each branch as a separate DAG and instead of publishing XCOMs for each branch, trigger the branch's DAG just like TriggerDagRunOperator does
But since our monolithic DAG is generated programmatically via complex logic, this change would have been quite hard (lots of code rewrite). So we decided to continue with the poll-based approach and live with few minutes of extra delay in a pipeline that already takes several hours to complete

UPDATE-2

[with reference to UPDATE-1 section of question]

Since our actual implementation required us to just wait for creation of database, we decided to simplify the workflow as follows

database endpoints were fixed via DNS (they didn't change every time Aurora snapshot was restored)
we did away with the MUX-task (and so also the SQS queue for Aurora restore lifecycle events)
each branch's begin-task branch-x.begin-task was modelled as a simple sensor that tried firing a dummy SQL query (SELECT 1) to check if database endpoint has become active or not

Conditionally execute multiple branches one by one

1 Answers1

Linked