0

I am trying to create a ragged list in R that corresponds to the D3 tree structure of flare.json. My data is in a data.frame:

path <- data.frame(P1=c("direct","direct","organic","direct"),
P2=c("direct","direct","end","end"),
P3=c("direct","organic","",""),
P4=c("end","end","",""), size=c(5,12,23,45))

path
       P1     P2      P3  P4 size
1  direct direct  direct end    5
2  direct direct organic end   12
3 organic    end               23
4  direct    end               45

but it could also be a list or reshaped if necessary:

path <- list()
path[[1]] <- list(name=c("direct","direct","direct","end"),size=5)
path[[2]] <- list(name=c("direct","direct","organic","end"), size=12)
path[[3]] <- list(name=c("organic", "end"), size=23)
path[[4]] <- list(name=c("direct", "end"), size=45)

The desired output is:

rl <- list()
rl <- list(name="root", children=list())
rl$children[1] <- list(list(name="direct", children=list()))
rl$children[[1]]$children[1] <- list(list(name="direct", children=list()))
rl$children[[1]]$children[[1]]$children[1] <- list(list(name="direct", children=list()))
rl$children[[1]]$children[[1]]$children[[1]]$children[1] <- list(list(name="end", size=5))

rl$children[[1]]$children[[1]]$children[2] <- list(list(name="organic", children=list()))
rl$children[[1]]$children[[1]]$children[[2]]$children[1] <- list(list(name="end",    size=12))

rl$children[[1]]$children[2] <- list(list(name="end", size=23))

rl$children[2] = list(list(name="organic", children=list()))
rl$children[[2]]$children[1] <- list(list(name="end", size=45))

So when I print to json it's:

require(RJSONIO)
cat(toJSON(rl, pretty=T))

 {
"name" : "root",
"children" : [
    {
        "name" : "direct",
        "children" : [
            {
                "name" : "direct",
                "children" : [
                    {
                        "name" : "direct",
                        "children" : [
                            {
                                "name" : "end",
                                "size" : 5
                            }
                        ]
                    },
                    {
                        "name" : "organic",
                        "children" : [
                            {
                                "name" : "end",
                                "size" : 12
                            }
                        ]
                    }
                ]
            },
            {
                "name" : "end",
                "size" : 23
            }
        ]
    },
    {
        "name" : "organic",
        "children" : [
            {
                "name" : "end",
                "size" : 45
            }
        ]
    }
]
}

I am having a hard time wrapping my head around the recursive steps that are necessary to create this list structure in R. In JS I can pretty easily move around the nodes and at each node determine whether to add a new node or keep moving down the tree by using push as needed, eg: new = {"name": node, "children": []}; or new = {"name": node, "size": size}; as in this example. I tried to split the data.frame as in this example:

 makeList<-function(x){
   if(ncol(x)>2){
      listSplit<-split(x,x[1],drop=T)
      lapply(names(listSplit),function(y){list(name=y,children=makeList(listSplit[[y]]))})
   } else {
      lapply(seq(nrow(x[1])),function(y){list(name=x[,1][y],size=x[,2][y])})
   }
 }

 jsonOut<-toJSON(list(name="root",children=makeList(path)))

but it gives me an error

 Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
 Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)?
Community
  • 1
  • 1
  • What does your data frame look like? – AmeliaBR Feb 05 '14 at 19:49
  • @AmeliaBR I added the output of the data frame. – Michael Whitaker Feb 05 '14 at 20:32
  • Okay. Yes, you had the information there but I couldn't figure out how it represented a tree structure. Now I see: each row represents a path from root to leaf. [Here's another d3/javascript example of a similar data structure](http://stackoverflow.com/a/20964688/3128209), I'll try to re-write it in R. – AmeliaBR Feb 05 '14 at 21:50

2 Answers2

1

The function given in the linked Q&A is essentially what you need, however it was failing on your data set because of the null values for some rows in the later columns. Instead of just blindly repeating the recursion until you run out of columns, you need to check for your "end" value, and use that to switch to making leaves:

makeList<-function(x){
    listSplit<-split(x[-1],x[1], drop=TRUE);
    lapply(names(listSplit),function(y){
        if (y == "end") { 
            l <- list();
            rows = listSplit[[y]];
            for(i in 1:nrow(rows) ) {
               l <- c(l, list(name=y, size=rows[i,"size"] ) );
            }
            l;

       }
        else {
             list(name=y,children=makeList(listSplit[[y]]))
        }
    });
}
Community
  • 1
  • 1
AmeliaBR
  • 27,344
  • 6
  • 86
  • 119
  • Many thanks @AmeliaBR. This works very well. I was futzing around with the empty cells instead of doing something with the "end" value as per your solution. – Michael Whitaker Feb 06 '14 at 20:17
  • Yeah, the null values seem to really throw off the list split function results, so best to intervene before you get there, since your data already has a clear definition of the "leaf" position. – AmeliaBR Feb 06 '14 at 21:46
0

I believe this does what you want, though it has some limitations. In particular, it is assumed that every branch in your network is unique (i.e. there can't be two rows in your data frame that are equal for every column other than size):

df.split <- function(p.df) {
  p.lst.tmp <- unname(split(p.df, p.df[, 1]))
  p.lst <- lapply(
    p.lst.tmp, 
    function(x) {
      if(ncol(x) == 2L && nrow(x) == 1L) {
        return(list(name=x[1, 1], size=unname(x[, 2])))
      } else if (isTRUE(is.na(unname(x[ ,2])))) {
        return(list(name=x[1, 1], size=unname(x[, ncol(x)])))
      }
      list(name=x[1, 1], children=df.split(x[, -1, drop=F]))
    }
  )
  p.lst
}
all.equal(rl, df.split(path)[[1]])
# [1] TRUE

Though note you had the organic size switched, so I had to fix your rl to get this result (rl has it as 45, but your path as 23). Also, I modified your path data.frame slightly:

path <- data.frame(
  root=rep("root", 4),
  P1=c("direct","direct","organic","direct"),
  P2=c("direct","direct","end","end"),
  P3=c("direct","organic",NA,NA),
  P4=c("end","end",NA,NA), 
  size=c(5,12,23,45), 
  stringsAsFactors=F
)

WARNING: I haven't tested this with other structures, so it's possible it will hit corner cases that you'll need to debug.

BrodieG
  • 51,669
  • 9
  • 93
  • 146
  • Thanks @BrodieG - this works great. Yes, every path is unique; otherwise it would have been summed up already in an existing path. I also appreciate that you changed the input data frame format to suit your function. – Michael Whitaker Feb 06 '14 at 19:49