2

Any ideas/options about handling Data Lineage in Snowflake? We are following a microservice architecture in which we are running a set of stored procedures that contain quite a few SQL queries as soon as certain events are triggered.

Example: When Table A is populated execute SP_Populate_Table_B and the result is that Table B is populated. We have a big set of SPs as we are populating the Staging Area, DataVault and our Dimensional Model.

We are in the lookout for any good way of handling all the metadata around this microservice way of performing our ETL. Basically automated way to track dependencies between tables, visualize the orchestration, have a better way to handle the changes of the SPs when tables are changed etc.

Can you please advice for some frameworks or tools, preferably open-source, that you have tried for Snowflake? Will DBT be a solution to that?

Thank you Pantelis

  • Often data vault (and more frequently other types of data workloads) is generated from metadata. So you already have lineage and dependency defined in your source metadata – Nick.Mc Mar 27 '20 at 11:28
  • sort of, have you already explored doing this with steams and tasks? Example: https://support.snowflake.net/s/article/Building-a-Type-2-Slowly-Changing-Dimension-in-Snowflake-Using-Streams-and-Tasks-Part-2 documentation: https://docs.snowflake.com/en/user-guide/tasks.html and https://docs.snowflake.com/en/user-guide/tasks-intro.html – Rachel McGuigan Apr 01 '20 at 20:41

2 Answers2

0

dbt is a good solution to deploying your warehouse as code, but not a great solution for using your warehouse as a db for services to write intermediary tables.

If you care about data lineage, and you're willing to rethink the SP approach, then I would recommend dbt as a tool to deploy your warehouse infrastructure as code, and easily understand the downstream dependencies of your data.

dbt is great if you are willing to approach everything as an ELT problem, and allow dbt to be the infrastructure that transforms a subset of your mass-loaded data/events, into something that is ready to be analyzed or ingested for BI.

Read this for more context: https://discourse.getdbt.com/t/understanding-idempotent-data-transformations/518

0

I'm not 100% sure if it supports snowflake just yet but I'd highly recommend looking into Packyderm. I believe it was built to solve just this kind of problem.

Might be worth a look or even contributing to if you really want Snowflake support.

sgdata
  • 2,543
  • 1
  • 19
  • 44