1

I'm trying to get data lineage metadata like data source/schema and data target/schema in a custom Action plugin which gets executed after the successful run of the other steps in the pipeline.

I have a basic Action plugin that executes but I'm having trouble finding a way to get the metadata I'm after.

The use case I'm working on is pushing data lineage into a third party data governance tool.

I would very much appreciate if someone could point me in the right direction!

Jesse
  • 3,243
  • 1
  • 22
  • 29
Vaughn
  • 123
  • 1
  • 8
  • Can you share more information about the pipeline workflow and what are "data source/schema and data target/schema"? – Nick_Kh Mar 19 '21 at 09:22
  • I'm using a very simple example pipeline which simply takes a CSV file from Google Cloud Storage, does some minor transformation and loads into BigQuery. Once this is complete, I'd like by Action plugin to execute and push metadata into a third party system to track data lineage. – Vaughn Mar 23 '21 at 04:00
  • Basically, I want to create a generic plugin which can be added to any existing Data Fusion pipeline which will execute after the pipeline has been successfully executed and will detect the metadata of the input source, detect the metadata of the output target and push this metadata into the third party system. The only configuration needed in the plugin should be the REST endpoint which this metadata should be posted to. – Vaughn Mar 23 '21 at 04:15
  • Have you considered to use HTTPCallback [plugin](https://github.com/cdapio/hydrator-plugins/blob/develop/http-plugins/docs/HTTPCallback-postaction.md) after end of run? – Nick_Kh Mar 24 '21 at 13:09
  • The method of posting data isn't the concern it's getting access to the appropriate metadata from the context of the action plugin that I'm having trouble with. – Vaughn Mar 24 '21 at 21:44
  • 1
    What about CDAP origin [Metadata Microservices](https://cdap.atlassian.net/wiki/spaces/DOCS/pages/477692187/Metadata+Microservices)? I would think you can use HTTP Rest full API to fetch up the metadata. Does it make any sense here? – Nick_Kh Mar 25 '21 at 12:12
  • Yeah that may be the only way. Thanks for the suggestion – Vaughn Mar 26 '21 at 11:09

1 Answers1

1

As was suggested in my comment, you might consider to use CDAP system metadata inventory to extract the particular property for the desired entity via CDAP existed RESTfull API methods by sending appropriate HTTP request as explained in CDAP Metadata Microservices documentation. Said this entity properties can also depict lineage of dataset fields returning the result in JSON format.

However, adjusting appropriate HTTP method mostly depends on the particular use case, therefore feel free to further contribute and share your further discovering.

Nick_Kh
  • 5,089
  • 2
  • 10
  • 16