6

When we realize the data lake with GCP Cloud storage, and data processing with Cloud services such as Dataproc, Dataflow, how can we generated data lineage report in GCP?

user4157124
  • 2,809
  • 13
  • 27
  • 42

2 Answers2

4

Google Cloud Platform doesn't have serverless data lineage offering.

Instead, you may want to install Apache Atlas on Google Cloud Dataproc and use it for data lineage.

Igor Dvorzhak
  • 4,360
  • 3
  • 17
  • 31
0

Google Cloud Data Fusion supports lineage in the Enterprise edition. You can use DF to build and orchestrate pipelines and use Dataproc and Dataflow as the capacity for running them. Introduction to CDF lineage can be found in the documentation here: https://cloud.google.com/data-fusion/docs/tutorials/lineage

If you otherwise do not use CDF capabilities, it is a bit overkill for just lineage. Lineage capability in Google Cloud Data Catalog would be optimal at least in many of my use-cases. Unfortunately currently CDC does not support lineage. I hope it is on the product roadmap and it would support lineage in the future.

Veikko
  • 3,372
  • 2
  • 21
  • 31