5

Is there a quick way to scan an R script and determine which packages are actually used? By this I mean looking at all of the functions called in the script and returning a list of packages that contain these function names? (I know that function names are not exclusive to any one package)

Why not just look at packages called by library() or require()? Right. Well, I have a bad habit of loading packages I often use regardless of whether I actually use them in the script.

I'd like to clean up some scripts that I intend to share with others by removing unused packages.

I resolve to change my ways in 2016. Please help me get started.

Update

Some good ideas in the comments...

# create an R file that uses a few functions

fileConn<-file("test.R")
writeLines(c("df <- data.frame(v1=c(1, 1, 1), v2=c(1, 2, 3))",
             "\n",
             "m <- mean(df$v2)",
             "\n",
             "describe(df)  #psych package"),
           fileConn)
close(fileConn)

# getParseData approach
pkg <- getParseData(parse("test.R"))
pkg <- pkg[pkg$token=="SYMBOL_FUNCTION_CALL",]
pkg <- pkg[!duplicated(pkg$text),]
pkgname <- pkg$text
pkgname
# [1] "data.frame" "c"          "mean"       "describe" 

Update 2

An ugly attempt to implement @nicola's idea:

# load all probable packages first
pkgList <- list(pkgname)
for (i in 1:length(pkgname)) {
  try(print(packageName(environment(get(pkgList[[1]][i])))))
}

It does not like the c() function, but the results seem otherwise correct.

#[1] "base"
#Error in packageName(environment(get(pkgList[[1]][i]))) : 
#  'env' must be an environment
#[1] "base"
#[1] "psych"
Eric Green
  • 7,385
  • 11
  • 56
  • 102
  • I hope this isn't pedantic, but: can you just manually uncheck all the packages in the Packages menu in RStudio... then run your script and see what functions it can't find? If you recognize "Oh, ```melt``` needs ```reshape2``` " then you can go in and load it at the top of your script. – Nancy Dec 11 '15 at 17:00
  • Maybe `getParseData`, as outlined [here](http://stackoverflow.com/q/33064376/324364)? – joran Dec 11 '15 at 17:01
  • I'd try to determine the functions called in the script and then run over them `packageName(environment(function))` which should give the package of the function. Next, you can check the loaded packages and confront them to the actually used. – nicola Dec 11 '15 at 17:06
  • @nicola Sounds like the first part would be `getParseData` and then lookup the package name as you described... – joran Dec 11 '15 at 17:08
  • Possible dupe: [How can I tell which packages I am not using in my R script?](http://stackoverflow.com/q/29415614/903061) – Gregor Thomas Dec 11 '15 at 17:15
  • Thanks, @joran. I added an example using `getParseData()`. Trying to incorporate @nicola's `packageName()` idea. Yes, @Gregor, looks like a dup that I did not find. Thanks. Does not seem to have a satisfactory answer. – Eric Green Dec 11 '15 at 17:29
  • dumb question: `packageName(environment(mean))` will tell me that `mean()` is a function in the `base` package, and I can pass `pkgname[3]`, which is "mean", to `packageName(environment(pkgname[3]))`, but R evaluates `pkgname[3]` as a string...how to make R see mean as `mean`, not "mean". – Eric Green Dec 11 '15 at 20:30

1 Answers1

1

An answer based on ideas in the question comments. The key functions are getParseData() and packageName().

# create an R file that uses a few functions

fileConn<-file("test.R")
writeLines(c("df <- data.frame(v1=c(1, 1, 1), v2=c(1, 2, 3))",
             "\n",
             "m <- mean(df$v2)",
             "\n",
             "describe(df)  #psych package"),
           fileConn)
close(fileConn)

# getParseData approach
pkg <- getParseData(parse("test.R"))
pkg <- pkg[pkg$token=="SYMBOL_FUNCTION_CALL",]
pkg <- pkg[!duplicated(pkg$text),]
pkgname <- pkg$text
pkgname
# [1] "data.frame" "c"          "mean"       "describe" 

# load all probable packages first
pkgList <- list(pkgname)
for (i in 1:length(pkgname)) {
  try(print(packageName(environment(get(pkgList[[1]][i])))))
}

#[1] "base"
#Error in packageName(environment(get(pkgList[[1]][i]))) : 
#  'env' must be an environment
#[1] "base"
#[1] "psych"

I'll mark this as correct for now, but happy to consider other solutions.

Eric Green
  • 7,385
  • 11
  • 56
  • 102