48

I have a very long R script with many if statements and exception cases. As i've been going, if been importing and testing libraries as I've gone and haven't really documented them very well. The problem is that if I run this from a clean installation, i'm not sure which statements the script will run, and so which libraries will be needed.

My question is: Is there any R function to test which libraries are being used in a script?

EDIT: I have not used all of the libraries that have been installed so print(sessionInfo()) won't be useful but and I just want to start the script with an install.packages function

smci
  • 32,567
  • 20
  • 113
  • 146
aeongrail
  • 1,304
  • 1
  • 13
  • 26
  • [this is what are you looking for](http://stackoverflow.com/questions/9341635/how-can-i-check-for-installed-r-packages-before-running-install-packages) – Ethaan Feb 14 '15 at 07:23
  • 4
    @Ethaan that is not really what he is asking – nico Feb 14 '15 at 08:14
  • I think you're looking for a spading/neutering tool of the author(s) of the script. I think you're doomed to running the script and installing packages as you go along, figuring out which function comes from which package. Good luck with same name functions across different packages (this is where the tool comes handy). I find `library("sos");findFn("foo")` handy for looking up functions. – Roman Luštrik Feb 14 '15 at 08:19
  • 1
    @Ethaan no worries, it can actually be a useful link as well! – nico Feb 14 '15 at 08:22
  • I think the answer by eh21 should be accepted. – Matteo Aug 11 '21 at 10:34

7 Answers7

28

I found the list.functions.in.file() function from NCmisc (install.packages("NCmisc")) quite helpful for this:

list.functions.in.file(filename, alphabetic = TRUE)

For more info see this link: https://rdrr.io/cran/NCmisc/man/list.functions.in.file.html

eh21
  • 629
  • 7
  • 7
  • 3
    Why did this get downvoted? Is there a reason why this isn't a preferred option? – LCM Mar 22 '19 at 20:28
  • 4
    Just a note -- you need to load the packages first, otherwise NCmisc doesn't know from what package the function comes. – RobertMyles Nov 19 '19 at 16:21
  • 2
    If you're using RStudio and want to use this to check the script you have open, run `list.functions.in.file(rstudioapi::getSourceEditorContext()$path, alphabetic = TRUE)` – Matthew Law Feb 16 '21 at 15:40
  • One problem with this approach is that it doesn't actually pay attention to the order in which packages are loaded so it will show functions as coming from multiple packages when in the reality of the script it would be coming from a specific one. You know any alternatives that can figure this out better? – Bryan Shalloway Dec 21 '21 at 01:29
  • I wrote a package {funspotr} that essentially does list.functions.in.file but outputs things in a dataframe format and makes a few other small changes: https://github.com/brshallo/funspotr – Bryan Shalloway Feb 09 '22 at 07:47
  • ´list.functions.in.file´ does not work if there are spanish or other rare characters in your codefiles. – Captain Tyler May 20 '22 at 20:54
15

The ‘renv’ package provides a robust solution for this nowadays via renv::dependencies.

renv::dependencies performs proper static analysis and reliably finds package dependencies even when they are declared in non-standard ways (e.g. via box::use) or via a package DESCRIPTION file rather than via library or ::.


As a quick hack I’ve previously (pre-‘renv’) used a shell script for this:

#!/usr/bin/env bash

source_files=($(git ls-files '*.R'))
grep -hE '\b(require|library)\([\.a-zA-Z0-9]*\)' "${source_files[@]}" | \
    sed '/^[[:space:]]*#/d' | \
    sed -E 's/.*\(([\.a-zA-Z0-9]*)\).*/\1/' | \
    sort -uf \
    > DEPENDS

This uses Git to collect all R files under version control in a project. Since you should be using version control anyway this is normally a good solution (although you may want to adapt the version control system). For the few cases where the project isn’t under version control you should (1) put it under version control. Or, failing that, (2) use find . -regex '.*\.[rR]' instead of git ls-files '*.R'.

And it produces a DEPENDS file containing a very simple list of dependencies.

It only finds direct calls to library and require though – if you wrap those calls, the script won’t work.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • 9
    I don't think the OP is asking for this, but I may be misunderstanding the question. What I think he is asking is: he has loaded several libraries and is not sure which one are unnecessary. – nico Feb 15 '15 at 00:45
  • 1
    If you use `[\.a-ZA-Z0-9]` instead of `\w` and `[[:alnum:]]` you capture all valid R package names. – calder-ty May 17 '17 at 17:42
  • This is great. Can we somehow get this into `usethis` ? (Also it does not currently handle requirements not attached but accessed by `::` or `:::`, http://adv-r.had.co.nz/Expressions.html#ast-funs might be a good starting point for a more general, R-based iplementation) Then again, I guess I should be using roxygen... – jan-glx Jun 28 '21 at 11:04
  • 1
    @jan-glx Honestly I wouldn’t use this snippet today. If I had to implement this myself I would do a proper static analysis, probably based off of the ‘codetools’ package. That said, ‘renv’ already implements this, and does a *much* better job, since it also supports non-standard ways of declaring package dependencies (e.g. via `box::use`). I’ve updated my answer to reflect this. – Konrad Rudolph Jun 28 '21 at 11:06
6

Based on everyone's response, especially eh21's suggestion of the NCmisc package, I put together a little function that outputs a list of packages used in all your R scripts in a directory, as well as their frequencies.

library(NCmisc)
library(stringr)
library(dplyr)

checkPacks<-function(path){

    ## get all R files in your directory
    ## by the way, extract R code from Rmd: http://felixfan.github.io/extract-r-code/
    files<-list.files(path)[str_detect(list.files(path), ".R$")]

    ## extract all functions and which package they are from 
    ## using NCmisc::list.functions.in.file
    funs<-unlist(lapply(paste0(path, "/", files), list.functions.in.file))
    packs<-funs %>% names()

    ## "character" functions such as reactive objects in Shiny
    characters<-packs[str_detect(packs, "^character")]

    ## user defined functions in the global environment
    globals<-packs[str_detect(packs, "^.GlobalEnv")]

    ## functions that are in multiple packages' namespaces 
    multipackages<-packs[str_detect(packs, ", ")]

    ## get just the unique package names from multipackages
    mpackages<-multipackages %>%
               str_extract_all(., "[a-zA-Z0-9]+") %>%
               unlist() %>%
               unique()
    mpackages<-mpackages[!mpackages %in% c("c", "package")]

    ## functions that are from single packages
    packages<-packs[str_detect(packs, "package:") & !packs %in% multipackages] %>%
              str_replace(., "[0-9]+$", "") %>%
              str_replace(., "package:", "") 

    ## unique packages
    packages_u<-packages %>%
                unique() %>%
                union(., mpackages)

    return(list(packs=packages_u, tb=table(packages)))

}

checkPacks("~/your/path")
SibyllWang
  • 111
  • 1
  • 7
  • Nice works nice, but it checks only loaded libraries. A solution is to load all installed packages as of descripted in this blog articel: https://www.r-bloggers.com/loading-all-installed-r-packages/ short: `lapply(.packages(all.available = TRUE), function(xx) library(xx, character.only = TRUE))` – Sebastian Müller Sep 14 '19 at 14:44
  • Nice, but you should let the regex in the first list.files-call be ".R$|.r$" so that files with .r is also used (like me - I never use capital letters in programming-related folders, at all. – emilBeBri Feb 19 '21 at 17:44
  • Also you can use the pattern argument for the regex instead of using string, like this: – emilBeBri Feb 19 '21 at 18:02
  • files <- list.files(path, pattern='.R$|.r$') – emilBeBri Feb 19 '21 at 18:02
5

I am not sure of a good way to automatize this... but what you could do is:

  1. Open a new R console
  2. Check with sessionInfo that you don't have extra packages loaded.
    You could check this using sessionInfo. If you, by default, load extra packages (e.g. using your .RProfile file) I suggest you avoid doing that, as it's a recipe for disaster.
    Normally you should only have the base packages loaded: stats, graphics, grDevices, utils, datasets, methods, and base.

    You can unload any extra libraries using:

    detach("package:<packageName>", unload=TRUE)
    
  3. Now run the script after commenting all of the library and require calls and see which functions give an error.

  4. To get which package is required by each function type in the console:

    ??<functionName>
    
  5. Load the required packages and re-run steps 3-5 until satisfied.

nico
  • 50,859
  • 17
  • 87
  • 112
  • Yeah, this is pretty much what i'm doing at the moment. I'll probably just end up having a few statements at the beginning installing all the packages that I currently have downloaded. - even if they're not useful. – aeongrail Feb 15 '15 at 20:33
2

You might want to look at the checkpoint function from Revolution Analytics on GitHub here: https://github.com/RevolutionAnalytics/checkpoint

It does some of this, and solves the problem of reproducibility. But I don't see that it can report a list of what you are using.

However if you looked a the code you probably get some ideas.

Mike Wise
  • 22,131
  • 8
  • 81
  • 104
2

I had a similar need when I needed to convert my code into a package, thus I need to identify every package dependency and either import or use full qualified name.

In reading book Extending R I found XRtools::makeImports can scan a package and find all packages need to be imported. This doesn't solve our problem yet as it only apply to existing package, but it provided the main insight on how to do it.

I made a function and put it into my package mischelper. You can install the package, either use the RStudio addin menu to scan current file or selected code, or use command line functions. Every external function (fun_inside) and the function that called it (usage) will be listed in table.

enter image description here

You can now go to each function, press F1 to find which package it belongs. I actually have another package that can scan all installed packages for function names and build a database, but that may cause more false positives for this usage because if you only loaded some packages, pressing F1 only search loaded packages.

See details of the usage in my package page

https://github.com/dracodoc/mischelper

dracodoc
  • 2,603
  • 1
  • 23
  • 33
1

I'd trust the {renv} based solutions the most for identifying packages dependencies.

Though I wrote a package funspotr that contains similar functionality to the answers mentioning NCmisc::list.functions.in.file() and can be used for parsing the functions or packages in a file or files:

library(dplyr)

funspotr::spot_pkgs("https://gist.githubusercontent.com/brshallo/4b8c81bc1283a9c28876f38a7ad7c517/raw/b399b768e900a381d99f5120e44d119c7fb40ab9/source_rmd.R")
#> [1] "knitr"    "magrittr" "stringr"  "readr"    "purrr"    "glue"

funspotr::spot_funs("https://gist.githubusercontent.com/brshallo/4b8c81bc1283a9c28876f38a7ad7c517/raw/b399b768e900a381d99f5120e44d119c7fb40ab9/source_rmd.R") %>% 
  select(-in_multiple_pkgs)
#> # A tibble: 13 x 2
#>    funs        pkgs   
#>    <chr>       <chr>  
#>  1 tempfile    base   
#>  2 purl        knitr  
#>  3 getOption   base   
#>  4 options     base   
#>  5 .Call       base   
#>  6 source      base   
#>  7 library     base   
#>  8 read_file   readr  
#>  9 map         purrr  
#> 10 str_extract stringr
#> 11 glue        glue   
#> 12 str_c       stringr
#> 13 write_file  readr
Bryan Shalloway
  • 748
  • 7
  • 15