11

I want to compute the size of a directory in R. I tried to use the list.info function, by unfortunably that follows the symbolic links so my results are biased:

# return wrong size, with duplicate counts for symlinks
sum(file.info(list.files(path = '/my/directory/', recursive = T, full.names = T))$size)

How do I compute the file size of a directory, so that it gives me the same result as on Linux, e.g. with du -s for example?

Thanks

Carmellose
  • 4,815
  • 10
  • 38
  • 56

5 Answers5

6

I finally used this:

system('du -s')
Carmellose
  • 4,815
  • 10
  • 38
  • 56
  • Could you expand on this? Thank you. – MadmanLee Feb 05 '19 at 17:02
  • @MadmanLee R system() function invokes a system command. On Linux, if you call it with the `du` shell command, it will print out the size of the directory (see https://linux.die.net/man/1/du). If you run on windows, you will want to call a windows shell command instead. – Carmellose Feb 05 '19 at 17:27
5
system('powershell -noprofile -command "ls -r|measure -s Length"')

References:

  1. https://technet.microsoft.com/en-us/library/ff730945.aspx
  2. Get Folder Size from Windows Command Line
  3. https://stat.ethz.ch/R-manual/R-devel/library/base/html/system.html
  4. https://superuser.com/questions/217773/how-can-i-check-the-actual-size-used-in-an-ntfs-directory-with-many-hardlinks

You can also leverage cygwin if you have it; this lets you use Linux commands and get comparable results. Further there's a nice solution using Sysinternals in the last link I gave above.

Community
  • 1
  • 1
Hack-R
  • 22,422
  • 14
  • 75
  • 131
5

Healthy solution, might be very useful for checking a package size.

dir_size <- function(path, recursive = TRUE) {
  stopifnot(is.character(path))
  files <- list.files(path, full.names = T, recursive = recursive)
  vect_size <- sapply(files, function(x) file.size(x))
  size_files <- sum(vect_size)
  size_files
}

cat(dir_size(find.package("Rcpp"))/10**6, "MB")
#> 14.81649 MB

Created on 2021-06-26 by the reprex package (v2.0.0)

polkas
  • 3,797
  • 1
  • 12
  • 25
3

"file.size" return the actual size, size on disk is the actual amount of space being taken up on the disk. check this to understand the difference . https://superuser.com/questions/66825/what-is-the-difference-between-size-and-size-on-disk try this for size of all files:

 files<-list.files(path_of_directory, full.names = TRUE, recursive = TRUE)
 vect_size <- sapply(files, file.size)
 size_files <- sum(vect_size)
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
islem
  • 236
  • 1
  • 6
1

Recently, I have deal with this problem and here is my code:

library(pacman)
p_load(fs,tidyfst)

sys_time_print({
  dir_info(your_directory_path) -> your_dir_info
})

your_dir_info %>% 
  summarise_dt(size = sum(size,na.rm = T))

When I first run the code above, it takes about 3min to track 52G files (in 174,731 separate files). Later when I run again, it takes shorter than 6s. This is amazing.

Hope
  • 109
  • 5