1

In Apache Atlas, I am trying to model the data flow of different processes. The issue I am having is that some of these processes share common DataSets but I don't necessarily want the different processes I am modeling to appear to be connected to each other.

For example, in this lineage model, I want to show that there is an input of an XML Data source file into a process that outputs and transferred to another computer.

{
  "entity": {
    "typeName": "datasystem_datatransfer",
    "attributes": {
      "id":"b75af137-9279-4c73-be9f-0e37b686dde5",  
       "qualifiedName": "b75af137-9279-4c73-be9f-0e37b686dde5@datasystem_datatransfer",
      "displayName": "Data Transfer Use Case 1",    
       "inputs": [
        {        
          "uniqueAttributes":{"qualifiedName": "25b60fe5-891c-4c94-87ab-b075d838ec30@datasystem_datasource"},  
          "typeName": "datasystem_datasource"
        }
      ],
       "outputs": [
        {        
           "uniqueAttributes":{"qualifiedName": "21781e1b-4b94-435b-be0a-141776267c4e@datasystem_computer"},  
          "typeName": "datasystem_computer"
        }
      ],
      "description": "Data transfer from Data Source to Computer.",
      "name": "dataEgressUseCase1"
    }
  }
}

This will create a model like this:

datasystem_datasource --> datasystem_datatransfer --> datasystem_computer

I now have another process I want to model where I am using the same "datasystem_computer" but the process is a bit more complicated:

{
  "entities":[
{
    "typeName": "datasystem_datatransfer",
    "attributes": {
      "id":"1305f6c4-f0da-4929-be21-dd0798dc2086",  
       "qualifiedName": "1305f6c4-f0da-4929-be21-dd0798dc2086@datasystem_datatransfer",
      "displayName": "Data Transfer Use Case 2",
        "inputs": [
        {        
          "uniqueAttributes":{"qualifiedName": "c72375fb-34a5-4a22-895c-0d55435fdf26@datasystem_datasource "},  
          "typeName": "datasystem_datasource"
        }
      ],
       "outputs": [
        {        
           "uniqueAttributes":{"qualifiedName": "21781e1b-4b94-435b-be0a-141776267c4e@datasystem_computer"},  
          "typeName": "datasystem_computer"
        }
      ],
      "description": "Data Transfer from Data Source to PC.",
      "name": "dataEgressUseCase2"
    }
  },
  {
    "typeName": "datasystem_datatransfer",
    "attributes": {
      "id":"307e6f84-41af-482e-8641-39fa258e709d",  
        "qualifiedName": "307e6f84-41af-482e-8641-39fa258e709d@datasystem_datatransfer",
      "displayName": "Data Transfer Use Case 2.5",     
       "inputs": [
        {           
          "uniqueAttributes":{"qualifiedName": "21781e1b-4b94-435b-be0a-141776267c4e@datasystem_computer"},  
          "typeName": "datasystem_computer"
        }
      ],
      "outputs": [
        { 
          "uniqueAttributes":{"qualifiedName": "5acddaca-6eb8-48f9-be75-fc757e442985@datasystem_datasource"},  
          "typeName": "datasystem_datasource"
        }
      ],
      "description": "Data Transfer from Data Source to PC to Another PC.",
      "name": "dataEgressUseCase2.5"
    }
  }

  ] 
}

This should create a lineage diagram like:

datasystem_datasource --> datasystem_datatransfer --> datasystem_computer --> datasystem_datatransfer datasystem_datasource -->

The problem is that when I create this lineage, it changes the first lineage I created. They have different ID's so I am not sure why creating this second lineage would impact the first? I realize that they share the same datasystem_computer in one node, but they are different processes. What am I doing wrong?

jason
  • 3,821
  • 10
  • 63
  • 120

0 Answers0