2

My data is linked through an Id, ParentId system and I have managed to add correct integer levels, however, I would like to compose a function that automatically nests my 5 tiered hierarchy as a pathString for data.tree.

Structure:

Id                 Name               ParentId           ParentName    Level
701F0000006Iw8E    'Paid Media'       NA                 NA            1
701F0000006IS1t    'Bing ABC'         701F0000006Iw8Y    'Bing'        3    
701F0000006IS28    'Bing DEF'         701F0000006Iw8Y    'Bing'        3
701F0000006IS23    'Bing GHI'         701F0000006Iw8Y    'Bing'        3
701F0000006Imq9    'Bing JKL'         701F0000006Iw8Y    'Bing'        3
701F0000006IS1y    'Bing MNO'         701F0000006Iw8Y    'Bing'        3
701F0000006Iw8Y    'Bing'             701F0000006Iw8E    'Paid Media'  2
701F0000006IvcW    'Google'           701F0000006Iw8E    'Paid Media'  2
7012A000006rhY8    'Adwords ABC'      701F0000006IvcW    'Google'      3
701F0000006IS1j    'Adwords DEF'      701F0000006IvcW    'Google'      3
701F0000006IS1o    'Adwords GHI'      701F0000006IvcW    'Google'      3
701F0000006IS1Z    'Adwords JKL'      701F0000006IvcW    'Google'      3
701F0000006Ieci    'Adwords MNO'      701F0000006IvcW    'Google'      3

Currently, I run into the issue that pathString gets read only by a single tier in the following:

dat$pathString <- paste(dat$ParentId, 
      dat$Id, 
      sep = "/")

Ex.

 "NA/701F0000000SOEq"

Which, in reality to populate the whole tree correctly, I would need to identify all subsequent parents within the string:

 "NA/701F0000006Iw8E/701F0000006Iw8Y/701F0000006IS1t" for "Bing ABC"

Ideally, a single expression will work equivalently for all levels but I understand if each level needs to be handled separately.

Full Id,ParentId system here: Dropbox Link

sgdata
  • 2,543
  • 1
  • 19
  • 44
  • I don't see how this data is different from the question you linked to at all. Did you at least try that code? What exactly didn't work? – MrFlick Apr 14 '17 at 21:51
  • In this case its a character level rather than a numeric integer. In the case of the previous question, he uses `ind+1` to step-up through levels during each iteration of the `while`. I'm not sure how to do that through character values. i.e, `701F0000006Ieci + 1 = 701F0000006IvcW` And then, `701F0000006IvcW + 1 = 701F0000006Iw8E` – sgdata Apr 17 '17 at 14:24
  • Any reason why you need to do the path approach for data.trees? It seems simpler to just loop through and add the children? – Ian Wesley Apr 20 '17 at 22:11

1 Answers1

3

Although your question asks for a path string, the tree can be built directly from your data frame format.

library(data.tree)
dat <- read.table(text="
Id                 Name               ParentId           ParentName    Level
701F0000006Iw8E    'Paid Media'       NA                 NA            1
701F0000006IS1t    'Bing ABC'         701F0000006Iw8Y    'Bing'        2    
701F0000006IS28    'Bing DEF'         701F0000006Iw8Y    'Bing'        2
701F0000006IS23    'Bing GHI'         701F0000006Iw8Y    'Bing'        2
701F0000006Imq9    'Bing JKL'         701F0000006Iw8Y    'Bing'        2
701F0000006IS1y    'Bing MNO'         701F0000006Iw8Y    'Bing'        2
701F0000006Iw8Y    'Bing'             701F0000006Iw8E    'Paid Media'  3
701F0000006IvcW    'Google'           701F0000006Iw8E    'Paid Media'  3
7012A000006rhY8    'Adwords ABC'      701F0000006IvcW    'Google'      2
701F0000006IS1j    'Adwords DEF'      701F0000006IvcW    'Google'      2
701F0000006IS1o    'Adwords GHI'      701F0000006IvcW    'Google'      2
701F0000006IS1Z    'Adwords JKL'      701F0000006IvcW    'Google'      2
701F0000006Ieci    'Adwords MNO'      701F0000006IvcW    'Google'      2
", header=TRUE, stringsAsFactors = F)

# network build does not want a root node as a row, so adjust
# the given root to link to "tree_root"
dat$ParentId[is.na(dat$ParentId)] <- "tree_root"

# build the tree using the network layout - pairs of node ids
# in the first two columns. Remaining columns are node attributes
dat_network <- subset(dat, !is.na(dat$ParentId), c("Id", "ParentId", "Name"))
dat_tree <- FromDataFrameNetwork(dat_network, check = "check")

print(dat_tree, 'Name')

# 1  tree_root                              
# 2   °--701F0000006Iw8E          Paid Media
# 3       ¦--701F0000006Iw8Y            Bing
# 4       ¦   ¦--701F0000006IS1t    Bing ABC
# 5       ¦   ¦--701F0000006IS28    Bing DEF
# 6       ¦   ¦--701F0000006IS23    Bing GHI
# 7       ¦   ¦--701F0000006Imq9    Bing JKL
# 8       ¦   °--701F0000006IS1y    Bing MNO
# 9       °--701F0000006IvcW          Google
# 10          ¦--7012A000006rhY8 Adwords ABC
# 11          ¦--701F0000006IS1j Adwords DEF
# 12          ¦--701F0000006IS1o Adwords GHI
# 13          ¦--701F0000006IS1Z Adwords JKL
# 14          °--701F0000006Ieci Adwords MNO
Andrew Lavers
  • 4,328
  • 1
  • 12
  • 19
  • I believe the concept is correct here but it is not working. Here is my [dropbox link](https://www.dropbox.com/s/ai4s65sfhjr4ymn/dat.RDS?dl=0) to the full list of Id and ParentIds. – sgdata Apr 21 '17 at 14:50
  • @gscott I downloaded your data and can build the tree. Can you clarify "not working" -can you provide some code you tried. Your downloaded data only has ids - maybe you stripped the names for a reason but it is hard to verify the tree without any other names. The tree operations are a little slow with this number of nodes - I suggest you read the comments on performance in the data.tree documentation in case your real world data is even larger than what was provided. – Andrew Lavers Apr 22 '17 at 12:13
  • Sorry @epi99 - I ended up approaching this another way but I really appreciate your answer. Basically I kept getting 'truncated' responses on my tree calls whenever I'd try to climb for parent nodes and things like that. And you're right about the timing - I'm using this in a shiny app so the responsiveness just won't work with my needed interactivity. – sgdata May 04 '17 at 14:24