Questions tagged [data-governance]
21 questions
5
votes
1 answer
Catalogs in Databricks
I have started reading about the Unity Catalog that Databricks has introduced. I understand the basic issue that it is trying to solve, but I do not understand what exactly a Catalog is.
This was available in the Databricks documentation,
A catalog…

Minura Punchihewa
- 1,498
- 1
- 12
- 35
3
votes
3 answers
Can we use Microsoft Purview and Unity Catalog together
Unity Catalog is the Azure Databricks data governance solution for the Lakehouse. Whereas, Microsoft Purview provides a unified data governance solution to help manage and govern your on-premises, multicloud, and software as a service (SaaS)…

nam
- 21,967
- 37
- 158
- 332
2
votes
1 answer
How to maintain order of data frame when making pandas pivot table
trying to make a heat map out of a pivot table but am having trouble keeping the order of how I sorted my original data frame. Below is a sample code of what my data looks exactly like and how I made my pivot table.
simple_df = pd.DataFrame({'Skill…

Jasmine Koh
- 41
- 3
2
votes
2 answers
Data Lake: fix corrupted files on Ingestion vs ETL
Objective
I'm building datalake, the general flow looks like Nifi -> Storage -> ETL -> Storage -> Data Warehouse.
The general rule for Data Lake sounds like no pre-processing on ingestion stage. All ongoing processing should happen at ETL, so you…

VB_
- 45,112
- 42
- 145
- 293
1
vote
0 answers
Creating Data Lineage
In Apache Atlas, I am trying to model the data flow of different processes. The issue I am having is that some of these processes share common DataSets but I don't necessarily want the different processes I am modeling to appear to be connected to…

jason
- 3,821
- 10
- 63
- 120
1
vote
1 answer
Data governance Snowflake unload/copy to export data
I perform data unload form Snowflake to s3 or by using Snowql localy.
I'd like to know if there's any kind of data tracing (for data governance) to always record or tag and save somewhere in Snowflake that a data was unloaded.
Thanks

abdoulsn
- 842
- 2
- 16
- 32
1
vote
1 answer
How Can I Attach Policy Tags to columns using Python API
As a part of data governance, we have created Taxonomies, Policy Tags Using "Python API". And I am trying to Assign Policy Tags to Columns [Name, Age] for a table Project.Dataset.TMP_TBL.
Looked across the GCP Documentation but couldn't find any…

VIJAY BURRA
- 11
- 1
1
vote
1 answer
Does Azure Purview have a data lineage API?
I know there are connectors for Purview that support data lineage data collection. However, I'm wondering if Purview has any sort of API that allows any data processing (ETL) process to write a lineage record/document to the Purview lineage…

GregH
- 12,278
- 23
- 73
- 109
1
vote
1 answer
Is it possible to get lineage metadata from the pipeline in my Data Fusion Action plugin?
I'm trying to get data lineage metadata like data source/schema and data target/schema in a custom Action plugin which gets executed after the successful run of the other steps in the pipeline.
I have a basic Action plugin that executes but I'm…

Vaughn
- 123
- 1
- 8
1
vote
0 answers
Azure Data Governance Solution approach for Data Lakes
I am evaluating how to implement a Data Governance solution with Azure Data Catalogue for a Data Lake batch transformation pipeline. Below is my approach to it. Any insights please?
Data Factory can't capture the lineage from source to Data Lake.
I…

Cengiz
- 303
- 2
- 9
1
vote
0 answers
Data Lake governance tools
I am seeking advice on data governance toolset(s) you currently use for data lake and your thoughts about those tools:
Managing data models - ingress/at rest/egress
Tracking data lineage - who is using what fields?
Migration changes

amp123
- 43
- 5
1
vote
0 answers
How to display HBase data-lineage in Apache Atlas?
I am testing Apache Atlas data governance tool to display data lineage of a NoSQL database.
I understand that HBase is the only supported NoSQL database as of now (input metadata source).
I've set up Apache Atlas 2.0 in an environment having…

Lorem
- 11
- 1
1
vote
1 answer
Will IGC allow me to trace where data has been sourced from or how data is being consumed, for any ETL or Data Transformation Tool?
As part of our Governance initiative and regulatory requirement, we need to produce a Lineage (tractability) report, outlining the flow of data into our Warehouse, and the Reports or Services consuming its data. We are aware that Information…

Kevin Wei
- 165
- 11
0
votes
0 answers
How to apply data governance to API integration
I'm new to data governance/API integration, forgive me if the questions lacks some information.
We have an Application integration team who will integrate some applications together. For example, an application create some data in its application…

Yassine Abdul-Rahman
- 747
- 7
- 14
0
votes
1 answer
Azure Purview Data Lineage with Databricks
I am using Azure Purview for Data Governance, and Data Lineage. We use Databricks in our Data Architecture, but there isn't any native support for capturing Data Lineage with Databricks.
I found the following links that will allow you to create…

Patterson
- 1,927
- 1
- 19
- 56