0

I am looking for a way to record the status of the pipeline in a DB table. Assuming this is a very common use case. Is there any way where I can record

  1. status and time of completion of the complete pipeline.
  2. status and time of completion of selected individual activities.
  3. the ID of individual runs/execution.

The only way I found was using SQLActivity that is dependent on an individual activity but even there I cannot access the status or timestamp of the parent/node.

I am using a jdbc connection to connect to a remote SQLServer. And the pipeline is for coping S3 files into the SQLServer DB.

PyRaider
  • 607
  • 4
  • 11
  • 21

1 Answers1

0

Hmmm... I haven't tried this but I can hit you with some pointers to possibly achieve the desired results. However, you will have to do research & figure out actual implementation.

Option 1

  • Create a ShellCommandActivity, which has depends on set to last activity in your pipeline. Your shell will use aws-cli to list-runs details of the current run, you can use filters to achieve this.
  • Use Staging Data to move output of previous ShellActivity to SQLActivity to eventually insert into the destination SQLServer.

Option 2

  • Use AWS lambda to run aws-cli data-pipeline list-runs periodically, with filters, & update the destination table with latest activities. Resource
Amith Kumar
  • 4,400
  • 1
  • 21
  • 28