31

Given a root absolute directory path. How do I generate a dendrogram object of all path's below it so that I can visualize the directory tree with R?

Suppose the following call returned the following leaf nodes.

list.files(path, full.names = TRUE, recursive = TRUE)

root/a/some/file.R
root/a/another/file.R
root/a/another/cool/file.R
root/b/some/data.csv
root/b/more/data.csv

I'd like to make a plot in R like the output of the unix tree program:

root
├── a
│   ├── another
│   │   ├── cool
│   │   │   └── file.R
│   │   └── file.R
│   └── some
│       └── file.R
└── b
    ├── more
    │   └── data.csv
    └── some
        └── data.csv

It would be especially useful if the solution involved decomposing the file system tree into two data.frame's:

  1. a table of nodes (with which I could include attributes such as modification date)
  2. and a table of edges (also with attributes)

And then building the dendrogram object from those two data.frames.

wdkrnls
  • 4,548
  • 7
  • 36
  • 64
  • Exactly what type of plot did you have in mind? Can you show an example of how you want the data formatted and how you will plot your dendrogram? Anything to help make the problem more [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) would be helpful. – MrFlick Mar 18 '16 at 21:37
  • A simple hierarchical tree plot would be a great first step. But I am hoping to make a tree map as well. – wdkrnls Mar 18 '16 at 21:42
  • And I'd like to color attributes such as modified date. – wdkrnls Mar 18 '16 at 21:43
  • This is currently all so hypothetical. It would help if you could make it concrete. Is the problem reading the file system? Is the problem the plotting? If it's both it would be easier to break it into two parts (perhaps separate questions). Provide desired data or a sample reference plot. – MrFlick Mar 18 '16 at 21:46
  • Are you using R under Linux ? – Stéphane Laurent Mar 14 '17 at 15:48
  • I am using Linux. – wdkrnls Mar 14 '17 at 16:21

3 Answers3

24

It's worth adding that excellent fs package offers dir_tree function that delivers this functionality to R in a very convenient manner.

tmp_dir <- tempdir()
# Create some directories
for (i in 1:10) {
    dir.create(path = file.path(tmp_dir,
                                basename(tempfile(pattern = "dir")),
                                basename(tempfile(pattern = "sub_dir"))),
               recursive = TRUE)
}
# Create directory tree
fs::dir_tree(path = tmp_dir, recurse = TRUE)

Results

/tmp/RtmpEhB0ne
├── dir15213121dd5903
│   └── sub_dir1521315a5425ba
├── dir152131227b086f
│   └── sub_dir1521314255d96b
├── dir152131353e6603
│   └── sub_dir1521315b52aeed
├── dir15213136870535
│   └── sub_dir15213127b34f64
├── dir1521313bbf738b
│   └── sub_dir152131473939ea
├── dir152131403f4fd5
│   └── sub_dir152131115296e7
├── dir152131503d0d55
│   └── sub_dir15213114368572
├── dir1521316f0bb0c3
│   └── sub_dir1521314aea266b
├── dir1521317fe305e9
│   └── sub_dir152131bcfe8a
└── dir1521319800dfb
    └── sub_dir15213129defd4a

In addition to printing directory tree, discovered paths can be returned to an object.

sink(file = tempfile(fileext = ".log"))
res_fs_tree <- fs::dir_tree(path = tmp_dir, recurse = TRUE)
sink()
res_fs_tree[[1]]
# [1] "/tmp/RtmpEhB0ne/dir15213121dd5903/sub_dir1521315a5425ba"
Konrad
  • 17,740
  • 16
  • 106
  • 167
  • 1
    This is great---I was even using fs and couldn't find this because ```dir_tree``` doesn't seem like it's related to ```print```/viz. – twedl Jul 19 '19 at 11:55
  • saving to an R object was the real value to me. It would be great if saving to an R object would be an optional parameter to the dir_tree function – hackR Nov 03 '22 at 18:42
22

Here's a possible approach to get what you originally asked for which is a system like tree. This will give a data.tree object that's pretty flexible and could be made to plot like you might want but it's not entirely clear to me what you want:

path <- c(
    "root/a/some/file.R", 
    "root/a/another/file.R", 
    "root/a/another/cool/file.R", 
    "root/b/some/data.csv", 
    "root/b/more/data.csv"
)


library(data.tree); library(plyr)

x <- lapply(strsplit(path, "/"), function(z) as.data.frame(t(z)))
x <- rbind.fill(x)
x$pathString <- apply(x, 1, function(x) paste(trimws(na.omit(x)), collapse="/"))
(mytree <- data.tree::as.Node(x))

1  root                  
2   ¦--a                 
3   ¦   ¦--some          
4   ¦   ¦   °--file.R    
5   ¦   °--another       
6   ¦       ¦--file.R    
7   ¦       °--cool      
8   ¦           °--file.R
9   °--b                 
10      ¦--some          
11      ¦   °--data.csv  
12      °--more          
13          °--data.csv  


plot(mytree)

You can get the parts you want (I think) but it'll require you to do the leg work and figure out conversion between data types in data.tree: https://cran.r-project.org/web/packages/data.tree/vignettes/data.tree.html#tree-conversion

I use this approach in my pathr package's tree function when use.data.tree = TRUE https://github.com/trinker/pathr#tree

EDIT Per@Luke's comment below...data.tree::as.Node takes a path directly:

(mytree <- data.tree::as.Node(data.frame(pathString = path)))

                levelName
1  root2                 
2   ¦--a                 
3   ¦   ¦--some          
4   ¦   ¦   °--file.R    
5   ¦   °--another       
6   ¦       ¦--file.R    
7   ¦       °--cool      
8   ¦           °--file.R
9   °--b                 
10      ¦--some          
11      ¦   °--data.csv  
12      °--more          
13          °--data.csv  
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • 2
    Just an FYI, you could replace all of that code and keep just the created `path` variable at the top and `(mytree <- data.tree::as.Node(data.frame(pathString = path)))`. You don't need to use plyr or do any wrangling at all. – Luke W. Johnston Jan 18 '17 at 04:01
  • @Luke Excellent I'll add this to the answer – Tyler Rinker Jan 18 '17 at 14:10
  • `mytree` is supposed to be an environment, right? (class() still gives "Node" "R6") I'm asking, because plot(mytree) in RStudio opens a "Viewer" tab that I have never used and hardly ever noticed. Is there a way to have it plotted in a usual graphics device? The console output is truncated in my case -- that's why I ask. – mattu Feb 04 '17 at 17:03
7

If you are on Windows, you can use my package dir2json, by installing it like this:

drat::addRepo("stlarepo")
install.packages("dir2json")

It is also possible to use it on Linux, but there is a DLL linked to the GHC dynamic libraries, which must be installed on the system (while this DLL is standalone on Windows).

> library(dir2json)
> cat(dir2tree("src"))
src
|
`- contrib
   |
   +- PACKAGES.gz
   |
   +- PACKAGES
   |
   +- jsonAccess_0.1.1.tar.gz
   |
   +- expansions_1.2.tar.gz
   |
   `- dir2json_2.1.0.tar.gz
> cat(dir2tree("src", vertical=TRUE))
                                            src                                             
                                             |                                              
                                          contrib                                           
                                             |                                              
      ---------------------------------------------------------------------------           
     /          |                 |                       |                      \          
PACKAGES.gz  PACKAGES  jsonAccess_0.1.1.tar.gz  expansions_1.2.tar.gz  dir2json_2.1.0.tar.gz

The package also contains a Shiny application which generates an interactive Reingold-Tilford tree representation of a folder:

> dir2json::shinyDirTree(".")

Reingold-Tilford folder

Stéphane Laurent
  • 75,186
  • 15
  • 119
  • 225
  • Hi @Stéphane Laurent I am having trouble installing the dir2json package. This is the error I receive: ```Error: package or namespace load failed for 'dir2json': .onLoad failed in loadNamespace() for 'rJava', details: call: fun(libname, pkgname) error: JAVA_HOME cannot be determined from the Registry Error: loading failed Execution halted *** arch - x64 ERROR: loading failed for 'i386' * removing 'C:/Users//Documents/R/win-library/3.6/dir2json' Warning in install.packages : installation of package ‘dir2json’ had non-zero exit status``` – Jasppo Apr 10 '20 at 17:52
  • It seems like R cannot open this URL, which does not exist for me: ```Warning in install.packages : unable to access index for repository https://stlarepo.github.io/drat/bin/windows/contrib/3.6: cannot open URL 'https://stlarepo.github.io/drat/bin/windows/contrib/3.6/PACKAGES'``` – Jasppo Apr 10 '20 at 18:35
  • @Jasppo Ah yes, it does not exist for R 3.6. Can you try `install.packages("dir2json", type = "source")`? – Stéphane Laurent Apr 10 '20 at 18:43