5

I am learning Atlas and trying to find a way to import metadata from RDBMS like (Sql Server or Postgre Sql).

Could somebody provide reference/s to do it or steps?

I am using Atlas in docker with build in HBase and Solr. Intention is to import metadata from AWS RDS.

Update 1 To rephrase my question. Can we import metadata directly from RDS Sql Server or PostgreSql without importing actual data in hive (hadoop)?

Any comment/s or answer is appreciated. Thank you!

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Irshad Ali
  • 1,153
  • 1
  • 13
  • 39

3 Answers3

0

AFAIK, Atlas works on hive metastore.

Below is the AWS documention of how to do it in AWS Emr while creating the cluster it self. ... Metadata classification, lineage, and discovery using Apache Atlas on Amazon EMR


Here is Cloudera source from sqoop stand point.

From Cloudera source : Populate metadata repository from RDBMS in Apache Atlas question from Cloudera.

1) you create the new types in Atlas. For example, in the case of Oracle, and Oracle table type, column type, etc.
2) create a script or process that pulls the meta data from the source meta data store.
3) Once you have the meta data you want to store in Atlas, your process would create the associated Atlas entities, based on the new types, using the Java API or JSON representations through the REST API directly. If you wanted to, you could add lineage to that as you store the new entities.


The below documentation has step by step details on how to use sqoop to move from any RDBMS to hive.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_data-access/content/using_sqoop_to_move_...

You can refer to this as well: http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal

To get the metadata of all this sqoop imported data in to Atlas, make sure the below configurations are set properly.

http://atlas.incubator.apache.org/Bridge-Sqoop.html

Please note the above configuration step is not needed if your cluster configuration is managed by Ambari.

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • Thank you! I know that Atlas use hbase to store metadata but is it necessary to import database too in hive/hbase etc. I want to use Apache Ranger to enforce authorization, data masking, etc. Thus atleast now don't see any use of importing data from RDBMS. Although it looks like a duplicate data store. Any comment is welcome! – Irshad Ali May 08 '20 at 05:35
  • This question is going in a generic way. please be specfic and ask new question with exact requirements. Otherwise this will be marked as low quality question by SO – Ram Ghadiyaram May 11 '20 at 18:52
0

Using Rest API is one way is a good way to show MySQL metadata to the atlas catalog other way using spark hive_support() spark -> read MySQL using JDBC -> write into hive , or using sqoop

To help to create RDBMS related instances, DB, tables, columns, I have created a GitHub repository contains a template that can help you to understand how to add RDBMS or MySQL entities to the atlas

https://github.com/vettrivikas/Apche-Atlas-for-RDBMS

Miguel
  • 1,361
  • 1
  • 13
  • 24
0

We can use REST API to create a type and then send data to it. Like

Lets say i have a dashboard and a visualization on it. I can create a Type Definition and then push data to it

{
    "entityDefs": [
        {
            "superTypes": [
                "DataSet"
            ],
            "name": "Dashboard",
            "description": "The definition of a Dashboard",
            "attributeDefs": [
                {
                    "name": "name",
                    "typeName": "string",
                    "isOptional": true,
                    "cardinality": "SINGLE",
                    "valuesMinCount": -1,
                    "valuesMaxCount": 1,
                    "isUnique": false,
                    "isIndexable": false,
                    "includeInNotification": false,
                    "searchWeight": -1
                },
                {
                    "name": "childDataset",
                    "typeName": "array<Visualization>",
                    "isOptional": true,
                    "cardinality": "SET",
                    "valuesMinCount": 0,
                    "valuesMaxCount": 2147483647,
                    "isUnique": false,
                    "isIndexable": false,
                    "includeInNotification": false,
                    "searchWeight": -1
                }
            ]
        },
        {
            "superTypes": [
                "DataSet"
            ],
            "name": "Visualization",
            "description": "The definition of a Dashboard",
            "attributeDefs": [
                {
                    "name": "name",
                    "typeName": "string",
                    "isOptional": true,
                    "cardinality": "SINGLE",
                    "valuesMinCount": -1,
                    "valuesMaxCount": 1,
                    "isUnique": false,
                    "isIndexable": false,
                    "includeInNotification": false,
                    "searchWeight": -1
                },
                {
                    "name": "parentDataset",
                    "typeName": "array<Dashboard>",
                    "isOptional": true,
                    "cardinality": "SET",
                    "valuesMinCount": 0,
                    "valuesMaxCount": 2147483647,
                    "isUnique": false,
                    "isIndexable": false,
                    "includeInNotification": false,
                    "searchWeight": -1
                }
            ]
        }
    ],
    "relationshipDefs": [
        {
            "category": "RELATIONSHIP",
            "name": "dashboards_visualization_assignment",
            "description": "The relationship between a Dashboard and a Visualization",
            "relationshipCategory": "ASSOCIATION",
            "attributeDefs": [],
            "propagateTags": "NONE",
            "endDef1": {
                "type": "Dashboard",
                "name": "childDataset",
                "isContainer": false,
                "cardinality": "SET",
                "isLegacyAttribute": false
            },
            "endDef2": {
                "type": "Visualization",
                "name": "parentDataset",
                "isContainer": false,
                "cardinality": "SET",
                "isLegacyAttribute": false
            }
        }
    ]
}

Then, you can simply add data using a REST Call to {servername}:{port}/api/atlas/v2/entity/bulk

{
    "entities": [
        {
            "typeName": "Dashboard",
            "guid": -1000,
            "createdBy": "admin",
            "attributes": {
                "name": "sample dashboard",
                "childDataset": [
                    {
                        "guid": "-200",
                        "typeName": "Visualization"
                    }
                ]
            }
        }
    ],
    "referredEntities": {
        "-200": {
            "guid": "-200",
            "typeName": "Visualization",
            "attributes": {
                "qualifiedName": "bar-chart"
            }
        }
    }
}
}

Now, Look for Entities in Atlas.

Dashboard Entity on Atlas

s_mj
  • 530
  • 11
  • 28