hiding personal functions in R

Question

I have a few convenience functions in my .Rprofile, such as this handy function for returning the size of objects in memory. Sometimes I like to clean out my workspace without restarting and I do this with rm(list=ls()) which deletes all my user created objects AND my custom functions. I'd really like to not blow up my custom functions.

One way around this seems to be creating a package with my custom functions so that my functions end up in their own namespace. That's not particularly hard, but is there an easier way to ensure custom functions don't get killed by rm()?

Gavin Simpson · Accepted Answer · 2016-08-10T03:38:52.617

41

Combine attach and sys.source to source into an environment and attach that environment. Here I have two functions in file my_fun.R:

foo <- function(x) {
    mean(x)
}

bar <- function(x) {
    sd(x)
}

Before I load these functions, they are obviously not found:

> foo(1:10)
Error: could not find function "foo"
> bar(1:10)
Error: could not find function "bar"

Create an environment and source the file into it:

> myEnv <- new.env()
> sys.source("my_fun.R", envir = myEnv)

Still not visible as we haven't attached anything

> foo(1:10)
Error: could not find function "foo"
> bar(1:10)
Error: could not find function "bar"

and when we do so, they are visible, and because we have attached a copy of the environment to the search path the functions survive being rm()-ed:

> attach(myEnv)
> foo(1:10)
[1] 5.5
> bar(1:10)
[1] 3.027650
> rm(list = ls())
> foo(1:10)
[1] 5.5

I still think you would be better off with your own personal package, but the above might suffice in the meantime. Just remember the copy on the search path is just that, a copy. If the functions are fairly stable and you're not editing them then the above might be useful but it is probably more hassle than it is worth if you are developing the functions and modifying them.

A second option is to just name them all .foo rather than foo as ls() will not return objects named like that unless argument all = TRUE is set:

> .foo <- function(x) mean(x)
> ls()
character(0)
> ls(all = TRUE)
[1] ".foo"         ".Random.seed"

edited Aug 10 '16 at 03:38

answered Jan 28 '11 at 12:26

Gavin Simpson

170,508
25
396
453

3

Personal packages are fine for this, although it still irks me that you have to jump through so many hoops to build a package. Why do I have to provide documentation for every function in a personal package? – David Heffernan Jan 28 '11 at 12:33
2

Probably because we don't want undocumented packages on CRAN, and if R Core were to allow some checks to be by-passed, they'd have to write a whole bunch of code to allow one to install and load a deficient package. There are user tools provided to help in writing informal packages - like roxygen - so you maintain a source file (without Rd files) and generate the package files from those. – Gavin Simpson Jan 28 '11 at 12:37
2

And you don't have to document every function. Just stick an \alias{} for each function into a single help file and that should be enough to defeat the checking. You don't need \usage{} sections etc, so don't provide them. Ths trick is used quite often for internal package functions before NAMESPACES were much used. – Gavin Simpson Jan 28 '11 at 12:39
2

@Gavin The current blocks don't stop undocumented packages appearing on CRAN. As you point out, you can supply documentation that it empty. People document their work because it would be pointless to publish without. In my view, if it is desirable that certain standards are to be met on CRAN, then they should be enforced at the point of entry to CRAN. The current nannying approach is just user-hostile. – David Heffernan Jan 28 '11 at 13:46
@Gavin Now, this matters in a situation where you are developing and refactoring existing code. Yes you may have you used the \alias{} trick, but what if you rename and refactor a bunch of functions. You then need to update the documentation to match. Then you find it doesn't do what you need so you try something else, and have to change docs again. These distractions make development work more difficult, and I speak as a developer rather than a statistician. – David Heffernan Jan 28 '11 at 13:48
@Gavin Having said all that (and I do realise you are not personally responsible for R!!), you suggestion regarding environments is excellent and it has just received my up-vote!! – David Heffernan Jan 28 '11 at 13:50
@david, you should try building a package using Roxygen for documentation. Using ESS, adding the documentation for a function is as easy as C-c C-o. And since it's inline with the code it does not require switching to another file to update the documentation. I find it so useful I use Roxygen to document functions even when I'm not building a package. – JD Long Jan 28 '11 at 14:12
@JD Long But I don't want to build anything whilst I am developing. I just want to make a change and run it instantly. If I wanted to have to build something before I ran it I'd code in C++!! Do you see where I'm coming from? As for Roxygen, I like inline documentation, but there are phases of the development process when I want to ignore the documentation and focus on the code. – David Heffernan Jan 28 '11 at 14:17
@david, yup, the only thing that really matters is that you find a workflow that works for you. – JD Long Jan 28 '11 at 16:05
You might want to check out the mvbutils package, which claims the following: "Hierarchical workspace tree, code editing and backup, easy package prep, editing of packages while loaded" and many more. More than one of those items looks relevant to your interests.. – Spacedman Jan 28 '11 at 19:18
@david: As far as I know there is nothing that stops an undocumented package from being installed in R. `R CMD build` and `R CMD INSTALL` still work---just remove the documentation stubs that `package.skeleton()` places in the `man` folder. These commands do give warning messages to help CRAN maintainers filter out packages that are not ready for public release. – Sharpie Jan 28 '11 at 19:37
@Spacedman Thanks that looks interesting. I like the sound of "editing of packages while loaded". @Sharpie `R CMD build` and `R CMD INSTALL` take time. I'm looking to be able to have my new code running the second I've finished it - that's what I'm used to in my day job! – David Heffernan Jan 28 '11 at 19:41
@David: Personally, I use a Makefile that calls the `R CMD` commands with options such as `--no-docs`, `--no-vignettes`, etc. that speed up their execution. Hadley's [devtools](https://github.com/hadley/devtools) package can facilitate the reloading of new package code without restarting R although it is still a little rough around the edges with respect to some package types. – Sharpie Jan 28 '11 at 20:09
@Sharpie I've got a make file like that. Mostly though I find it easiest just to `source` all my files with a `glob` but that makes me feel a tad *dirty*. It used to be better! – David Heffernan Jan 28 '11 at 20:15

G. Grothendieck · Answer 2 · 2014-02-15T17:09:47.323

Here are two ways:

1) Have each of your function names start with a dot., e.g. .f instead of f. ls will not list such functions unless you use ls(all.names = TRUE) therefore they won't be passed to your rm command.

or,

2) Put this in your .Rprofile

attach(list(
   f = function(x) x, 
   g = function(x) x*x
), name = "MyFunctions")

The functions will appear as a component named "MyFunctions" on your search list rather than in your workspace and they will be accessible almost the same as if they were in your workspace. search() will display your search list and ls("MyFunctions") will list the names of the functions you attached. Since they are not in your workspace the rm command you normally use won't remove them. If you do wish to remove them use detach("MyFunctions") .

score 10 · Answer 3 · answered Jan 28 '11 at 12:49

10

Gavin's answer is wonderful, and I just upvoted it. Merely for completeness, let me toss in another one:

R> q("no")

followed by

M-x R

to create a new session---which re-reads the .Rprofile. Easy, fast, and cheap.

Other than that, private packages are the way in my book.

answered Jan 28 '11 at 12:49

Dirk Eddelbuettel

360,940
56
644
725

3

Presuming one is in ESS. Which I am. Except when I'm not. – JD Long Jan 28 '11 at 13:15
1

But then it's a shell prompt and we restart littler. Even faster :) – Dirk Eddelbuettel Jan 28 '11 at 13:23

score 3 · Answer 4 · answered Jan 28 '11 at 14:03

3

Another alternative: keep the functions in a separate file which is sourced within .RProfile. You can re-source the contents directly from within R at your leisure.

answered Jan 28 '11 at 14:03

Richie Cotton

118,240
47
247
360

score 2 · Answer 5 · answered Feb 21 '11 at 15:14

I find that often my R environment gets cluttered with various objects when I'm creating or debugging a function. I wanted a way to efficiently keep the environment free of these objects while retaining personal functions.

The simple function below was my solution. It does 2 things: 1) deletes all non-function objects that do not begin with a capital letter and then 2) saves the environment as an RData file

(requires the R.oo package)

cleanup=function(filename="C:/mymainR.RData"){  
library(R.oo)  
# create a dataframe listing all personal objects
everything=ll(envir=1)
#get the objects that are not functions
nonfunction=as.vector(everything[everything$data.class!="function",1])
#nonfunction objects that do not begin with a capital letter should be deleted
trash=nonfunction[grep('[[:lower:]]{1}',nonfunction)]
remove(list=trash,pos=1)
#save the R environment
save.image(filename)
print(paste("New, CLEAN R environment saved in",filename))
}

In order to use this function 3 rules must always be kept:
1) Keep all data external to R.
2) Use names that begin with a capital letter for non-function objects that I want to keep permanently available.
3) Obsolete functions must be removed manually with rm.

Obviously this isn't a general solution for everyone...and potentially disastrous if you don't live by rules #1 and #2. But it does have numerous advantages: a) fear of my data getting nuked by cleanup() keeps me disciplined about using R exclusively as a processor and not a database, b) my main R environment is so small I can backup as an email attachment, c) new functions are automatically saved (I don't have to manually manage a list of personal functions) and d) all modifications to preexisting functions are retained. Of course the best advantage is the most obvious one...I don't have to spend time doing ls() and reviewing objects to decide whether they should be rm'd.

Even if you don't care for the specifics of my system, the "ll" function in R.oo is very useful for this kind of thing. It can be used to implement just about any set of clean up rules that fit your personal programming style.

Patrick Mohr

score 0 · Answer 6 · answered Aug 02 '13 at 18:39

A nth, quick and dirty option, would be to use lsf.str() when using rm(), to get all the functions in the current workspace. ...and let you name the functions as you wish.

pattern <- paste0('*',lsf.str(), '$', collapse = "|")
rm(list = ls()[-grep(pattern, ls())])

I agree, it may not be the best practice, but it gets the job done! (and I have to selectively clean after myself anyway...)

ds440 · Answer 7 · 2015-09-18T19:17:34.073

Similar to Gavin's answer, the following loads a file of functions but without leaving an extra environment object around:

if('my_namespace' %in% search()) detach('my_namespace'); source('my_functions.R', attach(NULL, name='my_namespace'))

This removes the old version of the namespace if it was attached (useful for development), then attaches an empty new environment called my_namespace and sources my_functions.R into it. If you don't remove the old version you will build up multiple attached environments of the same name.

Should you wish to see which functions have been loaded, look at the output for

ls('my_namespace')

To unload, use

detach('my_namespace')

These attached functions, like a package, will not be deleted by rm(list=ls()).

hiding personal functions in R

7 Answers7

Linked

Related