8

I have written this R package that takes ages (> 10s, sometimes up to 20-30s!) to load.

Every time the package loads, such as when building the package at the step "** testing if installed package can be loaded", or directly calling library("my.package"), nothing happens for 20s.

This makes everything painfully slow during development: building documentation, building the package, running R check...

Of course I have my suspicions (looking at you, dodgy dependency), but I need to gather evidence before axing it.

Is there a way to profile the loading of the package, in order to identify the cause? Or more generally, how can I figure out what is happening under the hood?

asachet
  • 6,620
  • 2
  • 30
  • 74
  • Maybe take a look at the times during build? Identify what takes a lot of time and fix? – NelsonGon Jul 16 '19 at 09:29
  • 1
    The slow step when building documentation is the loading (`"Loading my.package..."`). The slow step when building the package is "`** testing if installed package can be loaded`". Simply directly calling `library("my.package")` takes 10-20s. It really is the loading that seems to be slow. – asachet Jul 16 '19 at 10:19
  • 2
    Do you have anything that is done on load(attach?) or call some other packages at that step? – NelsonGon Jul 16 '19 at 10:31
  • 1
    @NelsonGon Not that I am aware of. My code in `/R` only defines functions and some small hard-coded vectors. I also have a small dataset in an `.rda` file in `/data`. I am pretty certain this comes from one of the dependencies doing things on attach, but I have over 30 so trial-and-error would be annoying (it's not just removing the package in DESCRIPTION but editing the NAMESPACE or removing the `@importFrom`, etc). I'd rather profile if possible, given that I can live with the issue for now. – asachet Jul 16 '19 at 10:42
  • Not sure if this helps but could take a look: https://stackoverflow.com/questions/10300769/how-to-load-packages-in-r-automatically or this(comment from Dason might be useful) https://stackoverflow.com/questions/42933479/how-to-speed-up-the-package-library-loading-time-in-r?noredirect=1&lq=1 Can't think of a more developer way. Maybe someone else knows a solution. – NelsonGon Jul 16 '19 at 11:16
  • Maybe you can interactively profile a call to `devtools::load_all`? – Alexis Jul 22 '19 at 17:46
  • 1
    can't you use *profvis* on `loadNamespace()` ? – moodymudskipper Jul 22 '19 at 23:42
  • @Alexis @Moody_Mudskipper Thanks for the suggestions. Unfortunately, the profiling shows a total duration of 500-700ms in both cases whereas the actual time is close to 20 seconds. Ignoring that, it does show that most of the time is spent in several `namespaceImportFrom` but it does not say which packages. Any other idea? – asachet Jul 23 '19 at 11:45
  • Have you shared your package on Github? If so, can you post the url? – Fred Boehm Jul 23 '19 at 21:38
  • Unfortunately this particular package is proprietary (developed at work) and we only publish our open-source packages. In any case, I think the question would benefit from a general answer rather than something specific to my package. – asachet Jul 24 '19 at 06:51
  • 1
    maybe create a R script that calls `library` on those *30* packages and profile this R script? – chinsoon12 Jul 25 '19 at 03:08
  • If the load time is caused by loading some other package on which your package depends, you should be able to see that by first loading that package (`library(dependency)`) and then loading your package. – Jan van der Laan Jul 25 '19 at 11:07

2 Answers2

2

So an issue with using the detach method from @davide-lorino is that if there are entangled depends or imports, it will fail, and fail hard.

A better method is to use a future backend that loads each of the imports in a clean R session, and time how long it takes to load them via library.

I implemented this in a package that might be useful to others: https://github.com/rmflight/importedPackageTimings

rmflight
  • 1,871
  • 1
  • 14
  • 22
1

You can determine which library takes the longest to load by benchmarking a call to load each of the libraries that you are testing.

The key is to make sure that you unload the libraries. If you keep the libraries loaded before re-loading them, the library() function will determine that the library has loaded and return out. On a typical benchmark of 100 runs, 1 of them will represent the time it took to load your library, and the remaining 99 will represent the time it took library() to figure out that the library is loaded. The result (duration) will then be an aggregate of the 100 runs, yielding a very small number and almost no variance between the results, like so:

enter image description here

When what you really want looks more like:

enter image description here

Giving us a less surprising result for our efforts.

P.s. the detach_package() function is implemented like this:

detach_package <- function(pkg, character.only = FALSE)
{
  if(!character.only)
  {
    pkg <- deparse(substitute(pkg))
  }
  search_item <- paste("package", pkg, sep = ":")
  while(search_item %in% search())
  {
    detach(search_item, unload = TRUE, character.only = TRUE)
  }
}
Davide Lorino
  • 875
  • 1
  • 9
  • 27
  • Not quite as automated as I'd want but thanks! I think I'll use your idea but I'll read the dependencies from the DESCRIPTION file and test the packages programmatically. Also, that's assuming slowness issues come from the loading phase of another package... But at some point there must be something else happening. Assuming I go back the chain of slow packages until I find the culprit, how do I diagnose it? I'd welcome another more general answer explaining how to look at what's happening "under the hood". – asachet Jul 25 '19 at 13:17
  • 2
    Also given the magnitude of the delay observed (~10s) I might as well run a benchmark with `times=1` and not worry about detaching packages. – asachet Jul 25 '19 at 13:23