14

So I just finished up writing my first script to assemble a Weibull Analysis from a text file. In all of my tinkering I suspect I may have loaded some libraries that aren't used in the final script. Is there a quick way to check which libraries are being used by the script without checking each function?

Brian Riggs
  • 151
  • 1
  • 4
  • How many packages did you load? How long does it take the script to run? – Dason Feb 05 '15 at 15:09
  • 1
    Depending on the time the script takes to run: Just restart R (i.e. make sure no packages are loaded) and run the script until something fails because a library is missing, load the library, and repeat? – Eike P. Feb 05 '15 at 15:15
  • For someone feeling ambitious about this, there are several other questions on StackOverflow which cover this ground, none definitively. Consider these both starting places to research what others have done and potential duplicates once this has a good answer. https://stackoverflow.com/q/44123296/892313 https://stackoverflow.com/q/18300679/892313 https://stackoverflow.com/q/8761857/892313 https://stackoverflow.com/q/34228601/892313 – Brian Diggs Feb 27 '23 at 21:05
  • 1
    If you make your analysis a package you might replace library calls by `#' @import pkg` roxygen comments and the checks will tell you which packages aren't used – moodymudskipper Feb 28 '23 at 20:52
  • Having a number of packaged attached sufficient to warrant such a question is a sign that you might attach packages too carelessly. You might remove all but those you use a lot and use the `::` notation wherever your analysis fails, if it runs in a reasonable time – moodymudskipper Feb 28 '23 at 20:57

3 Answers3

4

Here is a script which should find packages which you have loaded which are not used in a script. It needs to be run in a clean session because there is no way to verify that the state of your current session is the same as what the script would create. It assumes that packages are only loaded with either library or require, which is good practice anyway. I have not extensively tested it, but seems fairly sound.

Explanation of how the code works is in the comments. This was an entertaining exercise writing this purely in base R so that it itself doesn't have to load any packages.

The idea to use getParseData as the starting point came from Eric Green's answer to this related question

# Define the file to test in the line below. That is the only per-run configuration needed.
fileToTest <- "Plot.R"

# Get the parse data for the file
parseData <- getParseData(parse(fileToTest), includeText = TRUE)

# Extract all the function calls and keep a unique list of them.
functionCalls <- unique(parseData[parseData$token == "SYMBOL_FUNCTION_CALL", "text"])

# Look for any calls to `library` or `require` and go two steps up the
# call tree to find the complete call (with arguments).
libraryCalls <- parseData[parseData$token == "SYMBOL_FUNCTION_CALL" & parseData$text %in% c("library", "require"),]
libraryCalls <- parseData[parseData$id %in% libraryCalls$parent,]
libraryCalls <- parseData[parseData$id %in% libraryCalls$parent,]
libraryCalls <- libraryCalls$text

# Execute all the library/require calls to attach them to this session
eval(parse(text = libraryCalls))

# For each function called,
# * Use `getAnywhere` to find out where it is found. That information is in a character
# vector which is the `where` component of the returned list.
# * From that vector of locations, keep only the ones starting with "package:",
# getting rid of those starting with "namespace:".
# * Take the first one of these which sould be the first package that the
# function is found in and thus would be the one used.
names(functionCalls) <- functionCalls
matchPkg <- vapply(functionCalls, 
                   FUN = (\(f) grep("^package:", getAnywhere(f)$where, value = TRUE)[1]), 
                   FUN.VALUE = character(1))

# get a list of all packages from the search path, keep only those that are
# actually packages (not .GlobalEnv, Autoloads, etc.), ignore those that are
# automatically attached (base, methods, datasets, utils, grDevices, graphics, stats),
# and then see of those which ones did not show up in the list of packages used
# by the functions.
packages <- search()
packages <- grep("^package:", packages, value = TRUE)
packages <- setdiff(packages, c("package:base", "package:methods", "package:datasets", "package:utils", "package:grDevices", "package:graphics", "package:stats"))
packages <- setdiff(packages, unique(matchPkg))

# Report results
if(length(packages) > 0) {
  cat("Unused packages: \n"); 
  print(packages)
} else {
  cat("No unused packages found.\n")
}

Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
2

If you attach libraries via library or require it's easiest to search your code for those. If you call libraries without attaching them via <library>::<export> syntax then search for ::. If you're worried about transitive dependencies or just generally would like to create a reproducible environment look at the packrat package: http://rstudio.github.io/packrat/

Lev Kuznetsov
  • 3,520
  • 5
  • 20
  • 33
1

This is not particularly beautiful or efficient, but it should do the job (in most cases):

library("stringr")

script_path = "/path/to/your/script.R"
load_command_pattern <- "library\\(\"[a-z,0-9]+\"\\)"

text <- readChar(script_path, file.info(script_path)$size)
pck <- str_extract_all(text, pattern = load_command_pattern)

# Find all instances where packages are loaded
packages <- list()
for(i in 1:length(pck[[1]])){
  p = pck[[1]][i]
  name <- str_extract(gsub("library", "", p), "[a-z,0-9]+")
  packages <- append(packages, name, after = length(packages))
}

# Load packages
for(i in 1:length(packages)){
  p <- packages[[i]]
  library(packages[[i]], character.only = TRUE)
}

# Make a list to store packages from which no function is called
remove <- list()
for(i in 1:length(packages)){
  p <- packages[[i]]
  # list all functions contained in the package
  funs <- ls(paste0("package:", p))
  # add an opening bracket to make sure to only find functions, not comments etc.
  functions <- paste0(funs, "\\(")
  # for every function in the package, check whether its name appears in the script
  in_script <- mapply(grepl, functions, text)
  # if none of the functions are contained in the script, add the package to the list
  if(!any(in_script)){
    remove <- append(remove, p)
  }
}

# Remove loading commands for all packages
for(i in 1:length(remove)){
  to_remove <- paste0("library\\(\"",remove[[i]] , "\"\\)")
  text = gsub(to_remove, "", text)
}

# Save output (to a new file! Don't overwrite your existing script without testing!)
sink(file = "/path/to/your/new_script.R")
cat(gsub("\\r", "", text))
sink()

Note that I assumed you loaded packages using library("package_name"). You might need to adjust the regex pattern.

What the code is supposed to do:

  1. Read your R-scripts at text
  2. Find all instances where you load a package. In this example, I am specifically searching for the call library(...). Here, we extract the package name, assuming it only consists of characters and digits.
  3. Load the packages, and list the functions they contain. If none of the functions is found in your script, append the package name to the list of packages to remove.
  4. Replace all instances where you load an unnecessary package. (You could also remove the line break.)
  5. Write the text of your script to a new file. Check whether the output looks like it was intended and test the new script.

Note that this is not perfect (e.g., functions with similar names may occur in multiple packages. Moreover, it is currently not distinguished between full function name matches and matches of function name endings (searching for my_function( will give a false positive for another_my_function(. You could add an additional check to see whether there are symbols, line breaks or spaces leading the function names). However, I assume the code should work for most cases.

Of course, if you load all packages at the beginning of your script, you could e.g. create the list of loaded packages manually. Similarly, you could print out the list of unused packages and remove them manually.

Manuel Popp
  • 1,003
  • 1
  • 10
  • 33