2

I am thinking of building a work flow as follows:

I have an application that writes almost 1000 csv files to a folder MY_DIRECTORY in s3 bucket MY_BUCKET. Now I would like to parse those files from s3 bucket and load into MySQL database using Apache Airflow.

From reading a several posts here: Airflow S3KeySensor - How to make it continue running and Airflow s3 connection using UI, I think it would best to trigger my Airflow DAG using AWS lambda which will be called as soon as a file lands on the s3 folder.

Being new to Airflow and Lambda, I am not getting the idea how to set up the lambda to trigger Airflow DAG. In this regard, if anyone please give some pointers, it would be really helpful. Thanks.

Joy
  • 4,197
  • 14
  • 61
  • 131

1 Answers1

6

Create the DAG that you want to trigger, then take advantage of the experimental REST APIs offered by Airflow.

You can read about them here: https://airflow.apache.org/docs/stable/api.html

In particular you want to use the following endpoint:

POST /api/experimental/dags/<DAG_ID>/dag_runs

You can pass the name of the DAG in the to trigger it correctly. Moreover you can explicitly pass the name of the file the DAG will have to process

curl -X POST \
  http://localhost:8080/api/experimental/dags/<DAG_ID>/dag_runs \
  -H 'Cache-Control: no-cache' \
  -H 'Content-Type: application/json' \
  -d '{"conf":"{\"FILE_TO_PROCESS\":\"value\"}"}'

Then use a Hook within the DAG to read the file that you specified.

arocketman
  • 1,134
  • 12
  • 21