4

This is more of a question about how to orga ise long R scripts. I have plenty of very long scripts in R. I often find myself in the situation where I import one raw dataset and then from this I might then create other datasets and so on which are used for different aspects of the analysis. So basically the original dataset is branched off to create others. With long scripts it can be quite difficult to understand the origin of the different branches. Does anyone have any techniques to deal with this ie how to get an overview of how datasets are derived from one another. Some kind of visualisation tool perhaps?

Sebastian Zeki
  • 6,690
  • 11
  • 60
  • 125

1 Answers1

3

With DiagrammeR, one can build a flow diagram incrementally, rendering it when desired with render_graph. It can get a bit unwieldy, though, if one is not diligent, as seen with a trivial example below.

library(DiagrammeR)
# Create an empty graph
graph <- create_graph()

#create simple data frame of individuals of random ages
df<-data.frame(id=1:100,age=rnorm(100,40,5))
head(df)
# Add a node for df, df$id, and df$age
graph <- add_node(graph, node = "df")
graph <- add_node(graph, node = "df$age")
graph <- add_node(graph, node = "df$id")

# Vector of breaks for cut
breaks <- c(0,seq(20,60,by=5),Inf)
# Add a node for breaks
graph <- add_node(graph,node = "breaks")

# Create df.cut data frame of age intervals
df.cut <- data.frame(id = df$id,
                     interval = cut(df$age,breaks = breaks))

# Add nodes for df.cut, data.frame, cut
# Use a different node shape for operations
graph <- add_node(graph, 
                  node = "df.cut")
graph <- add_node(graph, 
                  node = "data.frame", 
                  shape = "square")
graph <- add_node(graph, 
                  node = "cut", 
                  shape = "square")

# Add edges for df$id, df$age
# Use different arrowhead to indicate operation
graph <- add_edges(graph,
                   create_edges(
                     from = c("df","df"),
                     to = c("df$id","df$age"),
                     rel = "to_get",
                     arrowhead = "box")
)

# Add edges for cut 
graph <- add_edges(graph, 
                   from = c("df$age", "breaks", "cut"),
                   to = c("cut", "cut", "df.cut"),
                   rel = c("to_get","to_get", "to_get"))

# Add edges for data.frame
graph <- add_edges(graph, 
                   from = c("df$id", "cut", "data.frame"),
                   to = c("data.frame", "data.frame", "df.cut"),
                   rel = c("to_get","to_get", "to_get"))

render_graph(graph)

DiagrammeR graph of simple data manipulation in R

DrPositron
  • 187
  • 1
  • 2
  • 12
  • Hmmmmm. Not bad. I guess I'd have to be selective to avoid code bloat. I suppose ideally I was looking for something to pull out the datasets and link them up for me. Perhaps making me do it is better. Not sure. – Sebastian Zeki Dec 13 '15 at 16:08