0

I have an XML file that contains:

 <?xml version="1.0" encoding="UTF-8" ?>
<Repository xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<DECLARE>
<PhysicalColumn name="Department" parentName="&quot;Sample App Lite Data&quot;...&quot;D20 Offices&quot;" parentId="3001:129" parentUid="80ca6538-0bb9-0000-714b-e31d00000000" id="3003:484" uid="80ca6539-0bbb-0000-714b-e31d00000000" dataType="VARCHAR" precision="20" extName="//Table/SAMP_OFFICES_D/DEPARTMENT" specialType="none">
<SourceColumn>
<RefPhysicalColumn id="3003:427" uid="80ca64f9-0bbb-0000-714b-e31d00000000" qualifiedName="&quot;Sample App Lite Data&quot;...&quot;SAMP_OFFICES_D&quot;.&quot;Department&quot;"/>
</SourceColumn>
</PhysicalColumn>

<LogicalTable name="D2 Offices" parentName="&quot;SampleApp Lite&quot;" parentId="2000:42377" parentUid="80cb6802-07d0-0000-714b-e31d00000000" id="2035:42562" uid="80cb68bb-07f3-0000-714b-e31d00000000" x="938" y="669">
<Description><![CDATA[This logical table maps to the physical Office Dimension table with various attributes.]]></Description>
<Columns>
<RefLogicalColumn id="2006:42563" uid="80cb68bc-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Office&quot;"/>
<RefLogicalColumn id="2006:42564" uid="80cb68bd-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Office Key&quot;"/>
<RefLogicalColumn id="2006:42565" uid="80cb68be-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Department&quot;"/>
<RefLogicalColumn id="2006:42566" uid="80cb68bf-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Dept Key&quot;"/>
<RefLogicalColumn id="2006:42567" uid="80cb68c0-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Organization&quot;"/>
<RefLogicalColumn id="2006:42568" uid="80cb68c1-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Org Key&quot;"/>
<RefLogicalColumn id="2006:42569" uid="80cb68c2-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Company&quot;"/>
<RefLogicalColumn id="2006:42570" uid="80cb68c3-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Company Key&quot;"/>
<RefLogicalColumn id="2006:42571" uid="80cb68c4-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Office Sequence&quot;"/>
</Columns>
<TableSources>
<RefLogicalTableSource id="2037:43058" uid="80cb6a2c-07f5-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;LTS1 Offices&quot;"/>
</TableSources>
</LogicalTable>

<LogicalTableSource name="LTS1 Offices" parentName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;" parentId="2035:42562" parentUid="80cb68bb-07f3-0000-714b-e31d00000000" id="2037:43058" uid="80cb6a2c-07f5-0000-714b-e31d00000000" isActive="true">
<Link>
<StartNode>
<RefPhysicalTable id="3001:129" uid="80ca6538-0bb9-0000-714b-e31d00000000" qualifiedName="&quot;Sample App Lite Data&quot;...&quot;D20 Offices&quot;"/>
</StartNode>
</Link>
<WhereClause>
<Expr></Expr>
</WhereClause>
<GroupBy>
<Expr><![CDATA[ GROUPBYLEVEL("SampleApp Lite"."H2 Offices"."Offices Detail")]]></Expr>
</GroupBy>
<FragmentContent>
<Expr></Expr>
</FragmentContent>
</LogicalTableSource>

<PresentationColumn name="Department" parentName="&quot;Sample Targets Lite&quot;..&quot;Offices&quot;" parentId="4008:43412" parentUid="80cb6c16-0fa8-0000-714b-e31d00000000" id="4010:43649" uid="80cb6d77-0faa-0000-714b-e31d00000000" hasDispName="false" hasDispDescription="false" overrideLogicalName="false">
<Description><![CDATA[Returns the Department description from the Office dimension. Naturally drills into Office Column.]]></Description>
<RefLogicalColumn id="2006:42565" uid="80cb68be-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Department&quot;"/>
</PresentationColumn>
</DECLARE>
</Repository>

from where I need to find the source of Presentation Column which is physical column name and physical table using different Ids. For e.g I have PresentationColumn name= Department for which RefLogicalColumn id="2006:42565".

<**PresentationColumn name="Department"** parentName="&quot;Sample Targets Lite&quot;..&quot;Offices&quot;" parentId="4008:43412" parentUid="80cb6c16-0fa8-0000-714b-e31d00000000" id="4010:43649" uid="80cb6d77-0faa-0000-714b-e31d00000000" hasDispName="false" hasDispDescription="false" overrideLogicalName="false">
<Description><![CDATA[Returns the Department description from the Office dimension. Naturally drills into Office Column.]]></Description>
<**RefLogicalColumn id="2006:42565"** uid="80cb68be-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Department&quot;"/>
</PresentationColumn>

By using RefLogicalColumn id="2006:42565" we will search in LogicalTable using RefLogicalColumn id .

<LogicalTable name="D2 Offices" parentName="&quot;SampleApp Lite&quot;" parentId="2000:42377" parentUid="80cb6802-07d0-0000-714b-e31d00000000" id="2035:42562" uid="80cb68bb-07f3-0000-714b-e31d00000000" x="938" y="669">
<Description><![CDATA[This logical table maps to the physical Office Dimension table with various attributes.]]></Description>
<Columns>
<RefLogicalColumn id="2006:42563" uid="80cb68bc-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Office&quot;"/>
<RefLogicalColumn id="2006:42564" uid="80cb68bd-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Office Key&quot;"/>
<**RefLogicalColumn id="2006:42565"** uid="80cb68be-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Department&quot;"/>
<RefLogicalColumn id="2006:42566" uid="80cb68bf-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Dept Key&quot;"/>
<RefLogicalColumn id="2006:42567" uid="80cb68c0-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Organization&quot;"/>
<RefLogicalColumn id="2006:42568" uid="80cb68c1-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Org Key&quot;"/>
<RefLogicalColumn id="2006:42569" uid="80cb68c2-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Company&quot;"/>
<RefLogicalColumn id="2006:42570" uid="80cb68c3-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Company Key&quot;"/>
<RefLogicalColumn id="2006:42571" uid="80cb68c4-07d6-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;Office Sequence&quot;"/>
</Columns>
<TableSources>
<**RefLogicalTableSource id="2037:43058"** uid="80cb6a2c-07f5-0000-714b-e31d00000000" qualifiedName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;.&quot;LTS1 Offices&quot;"/>
</TableSources>
</LogicalTable>

Then using RefLogicalTableSource id = 2037:43058 we will search in LogicalTableSource by using id.

<LogicalTableSource name="LTS1 Offices" parentName="&quot;SampleApp Lite&quot;.&quot;D2 Offices&quot;" parentId="2035:42562" parentUid="80cb68bb-07f3-0000-714b-e31d00000000" **id="2037:43058"** uid="80cb6a2c-07f5-0000-714b-e31d00000000" isActive="true">
<Link>
<StartNode>
<**RefPhysicalTable id="3001:129"** uid="80ca6538-0bb9-0000-714b-e31d00000000" qualifiedName="&quot;Sample App Lite Data&quot;...&quot;D20 Offices&quot;"/>
</StartNode>
</Link>
<WhereClause>
<Expr></Expr>
</WhereClause>
<GroupBy>
<Expr><![CDATA[ GROUPBYLEVEL("SampleApp Lite"."H2 Offices"."Offices Detail")]]></Expr>
</GroupBy>
<FragmentContent>
<Expr></Expr>
</FragmentContent>
</LogicalTableSource>

Then using RefPhysicalTable id= 3001:129 we will search in PhysicalColumn using parentId.

<PhysicalColumn name="Department" parentName="&quot;Sample App Lite Data&quot;...&quot;D20 Offices&quot;" **parentId="3001:129"** parentUid="80ca6538-0bb9-0000-714b-e31d00000000" id="3003:484" uid="80ca6539-0bbb-0000-714b-e31d00000000" dataType="VARCHAR" precision="20" extName="//Table/SAMP_OFFICES_D/DEPARTMENT" specialType="none">
<SourceColumn>
<RefPhysicalColumn id="3003:427" uid="80ca64f9-0bbb-0000-714b-e31d00000000" qualifiedName="&quot;Sample App Lite Data&quot;...&quot;SAMP_OFFICES_D&quot;.&quot;Department&quot;"/>
</SourceColumn>
</PhysicalColumn>

From here we need PhysicalColumn name="Department" and extName="//Table/SAMP_OFFICES_D/DEPARTMENT"

My first problem is converting my xml file to data frame and second is back tracking the source.

ndmeiri
  • 4,979
  • 12
  • 37
  • 45
Milan6687
  • 55
  • 1
  • 6

1 Answers1

0

xml2::read_xml will help you read it. The other will be harder, since it looks like you have 3 relational tables. See this page and possibly this, although it made it messy by combining it into one table when I tried.

library(xml2)
library(tidyverse)
dfxml <- xml2::read_xml("C:/foo/bar.xml")

mcga <- function(tbl) {
  x <- colnames(tbl)
  x <- tolower(x)
  x <- gsub("[[:punct:][:space:]]+", "_", x)
  x <- gsub("_+", "_", x)
  x <- gsub("(^_|_$)", "", x)
  x <- make.unique(x, sep = "_")
  colnames(tbl) <- x
  tbl
}

dfxlm2 <- xml_find_all(dfxml1, ".//*") %>% 
  map_df(~{
   xml_attrs(.x) %>% 
      as.list()
  }) %>% 
  mcga()

Or split them into 3 tables.

LogicalTable <- xml_find_all(dfxml1, ".//LogicalTable//*") %>% 
  map_df(~{
    xml_attrs(.x) %>% 
      as.list()
  }) %>% 
  mcga()

PhysicalTable <- xml_find_all(dfxml1, ".//PhysicalColumn") %>% 
  map_df(~{
    xml_attrs(.x) %>% 
      as.list()
  }) %>% 
  mcga()

LogTable <- xml_find_all(dfxml1, ".//LogicalTableSource//*") %>% 
  map_df(~{
    xml_attrs(.x) %>% 
      as.list()
  }) %>% 
  mcga()

How did you want to track these?

Anonymous coward
  • 2,061
  • 1
  • 16
  • 29