A solution might be to use S4 methods and letting R's internal dispatcher do the work for you (see example below). That way, you're somewhat "bulletproof" with respect to being able to systematically update your code without running the risk of breaking something.
Key benefits
The key thing here is that S4 methods support multiple dispatch.
That way your function will always be foo
(as opposed to having to keep track of foo1
, foo2
etc.) while new functionality can be easily implemented (by adding respective methods) without touching "old" methods (that other people/packages might rely on).
Key functions you'll need:
setGeneric
setMethod
setRefClass
(S4 Reference Classes; personal recommendation) or setClass
(S4 Class; I wouldn't use them for the reason described in the "Additional remarks" at the very end)
The "downsides"
You need to switch from a S3 to a S4 logic
This implies that you need to write a bit more code than what you might be used to (generic method definitions, method definitions and possibly own class defitions (see example below). But this "buys" yourself and your code much more structure and makes it more robust.
It might also imply that you'll eventually dig deeper and deeper into the world of Object-Oriented Programming or Object-Oriented Design. While I personally consider this to be a good thing (my personal rule of thumb: the more complex/distributed your application, the better you're off using OOP), some would consider these approaches to be R-untypic (I strongly disagree as R does have superb OO-features that are maintained by the Core Team) or "unsuited" for R (this might be true depending on how much you rely on "non-OOP" packages/code). If you're willing to go that way, you might want to familiarize yourself with the SOLID principles of Object-Oriented Design. You also might want to check out the following books: Clean Coder and The Pragmatic Programmer.
If computational efficiency (e.g. when estimating statistical models) is really critical, using S4 methods and S4 Reference Classes might slow you down a bit. After all, there's more code involved compared to S3. But I'd recommend testing the impact of this from case to case via system.time()
and/or microbenchmark::microbenchmark()
instead of picking "ideological" sides (S3 vs. S4).
Example
Initial function
Let's suppose you're in department A and someone in your team started out with creating a function called foo()
foo <- function(x, y) {
x + y
}
foo(x=10, y=20)
First change request
You would like to be able to extend it without breaking "old" code that relies on foo()
.
Now, I think we all agree that this can be quite hard to do.
You either need to explicitly modify the source code of foo()
(each time running the risk that you break something that already used to work; this violates the "O" in SOLID: Open Closed-Principle) or you need to come with alternative names such as foo1
, foo2
etc (really hard to keep track of which function is doing what).
foo <- function(x, y, type=c("old", "new")) {
type <- match.arg(type, choices=c("old", "new"))
if (type == "old") {
x + y
} else if (type == "new") {
x * y
}
}
foo(x=10, y=20)
[1] 30
foo(x=10, y=20, type="new")
[1] 200
foo1 <- function(x, y) {
x * y
}
foo1(x=10, y=20)
[1] 200
Let's see how S4 methods and multiple dispatch can really help us out here.
Generic method
You need to start out by turning foo()
into a generic method.
setGeneric(
name="foo",
signature=c("x", "y", ".ctx", ".ns"),
def=function(x, y, ..., .ctx, .ns) {
standardGeneric("foo")
}
)
In simplified words: a generic method itself doesn't do anything yet. It's simply a precondition in order to be able to specifiy "actual" methods for its signature arguments that do something useful.
Signature arguments
The degree of flexiblity with respect to the original problem is directly linked to the number of signature arguments that you declare (signature=c("x", "y", ".ctx", ".ns")
): the more signature arguments, the more flexiblity you have but the more complex your code might get as well (with respect to how much code you have to write).
Again, in simplified words: signature arguments (and it's classes) are used by the method dispatcher to retrieve the correct method that's doing the actual work.
Think of the method dispatcher being like the clerk in a ski rental business: you present him an arbitrary large set of signature information (i.e. information that "clearly distinguish you from others": your age, height, shoe size and skill level) and he uses that information to provide you with the right equipment to hit the slopes. Think of R's method dispatcher as beeing the clerk that has access to the storage room of the ski rental. But instead of ski equipment it will return methods.
Notice that we said that our "old" arguments x
and y
are from now on supposed to be signature arguments while there are also two new arguments: .ctx
and .ns
. I'll get to these in a minute. It's those arguments that will provide us with the flexibility that we're after.
Initial method definition
We now define a "variant" (a method) of the generic method for the following "signature scenario":
x
is numeric
y
is numeric
.ctx
will just not be provided when calling the method and is thus missing
.ns
will just not be provided when calling the method and is thus missing
Think of it as registering your signature information with explicit equipment of the ski rental. Once you did that and ask for your equipment, the only thing the clerk has to do is to go to the storage room and look up which equipment is linked to your personal information.
setMethod(
f="foo",
signature=signature(x="numeric", y="numeric", .ctx="missing", .ns="missing"),
definition=function(x, y, ..., .ctx, .ns) {
x + y
}
)
When we call foo
with this "signature scenario" (asking for the method that we registered for this scenario), the method dispatcher knows exactly which actual method it needs to get out of the storage room:
foo(x=10, y=20)
[1] 30
First update
Now someone from department B comes along, looks at foo()
, likes it but decides that foo()
needs to be updated (x * y
instead of x + y
) if it is to be used in his department.
That's when .ctx
(short for context) comes into play: it's an argument by which we are able to distinguish application contexts.
Definining a class that represents the new application context
setRefClass("ApplicationContextDepartmentB")
When calling foo()
, we'll provide it with an instance of this class
(.ctx=new("ApplicationContextDepartmentB")
)
Definining a new method for the new application context
Notice how we register signature argument .ctx
to our new class ApplicationContextDepartmentB
:
setMethod(
f="foo",
signature=signature(x="numeric", y="numeric",
.ctx="ApplicationContextDepartmentB", .ns="missing"),
definition=function(x, y, ..., .ctx, .ns) {
out <- x * y
attributes(out)$description <- "I'm different from the original foo()"
return(out)
}
)
That way, the method dispatcher knows exactly that it should return the "new" method instead of the "old" method when we call foo()
like this:
foo(x=1, y=10, .ctx=new("ApplicationContextDepartmentB"))
[1] 10
attr(,"description")
[1] "I'm different from the original foo()"
The "old" method is not affected at all:
foo(x=1, y=10)
[1] 30
Second update
Suppose that someone from department C comes along and suggests yet another "configuration" or version for foo()
. You can easily provide that withouth breaking anything that you've realized for departments A and B so far by following the same routine as for department B.
But we'll even take it one step further here: we'll define two additional classes that let us distinguish different "namespaces" (that's where .ns
comes into play).
Think of namespaces as a way of distinguishing different runtime scenarios for a specific method for a specific application context (i.e. "testing" and "productive mode").
Definining the classes
setRefClass("ApplicationContextDepartmentC")
setRefClass("TestNamespace")
setRefClass("ProductionNamespace")
Definining a new method for the new application context and a "test" scenario
Notice how we register signature arguments .ctx
to our new class ApplicationContextDepartmentC
and .ns
to our new class TestNamespace
:
setMethod(
f="foo",
signature=signature(x="character", y="numeric",
.ctx="ApplicationContextDepartmentC", .ns="TestNamespace"),
definition=function(x, y, ..., .ctx, .ns) {
data.frame(x, y, test.ok=rep(TRUE, length(x)))
}
)
Again, the method dispatcher will look up the correct method when calling foo()
like this:
foo(x=letters[1:5], y=11:15, .ctx=new("ApplicationContextDepartmentC"),
.ns=new("TestNamespace"))
x y test.ok
1 a 11 TRUE
2 b 12 TRUE
3 c 13 TRUE
4 d 14 TRUE
5 e 15 TRUE
Definining a new method for the new application context and a "productive" scenario
setMethod(
f="foo",
signature=signature(x="character", y="numeric",
.ctx="ApplicationContextDepartmentC", .ns="ProductionNamespace"),
definition=function(x, y, ..., .ctx, .ns) {
data.frame(x, y)
}
)
We tell the method dispatcher that we now want the method registered for this scenario or namespace like this:
foo(x=letters[1:5], y=11:15, .ctx=new("ApplicationContextDepartmentC"),
.ns=new("ProductionNamespace"))
x y
1 a 11
2 b 12
3 c 13
4 d 14
5 e 15
Notice that you're free to use the classes TestNamespace
and ProductionNamespace
anywhere you'd like. These classes are not bound to ApplicationContextDepartmentC
in any way, so you can for example also use the for all your other application scenarios.
Additional remarks for method definitions
Something that's often quite usefull is to start out with a method that accepts ANY
classes for its signature arguments and define more restrictive methods as your software evolves:
setMethod(
f="foo",
signature=signature(x="ANY", y="ANY", .ctx="missing", .ns="missing"),
definition=function(x, y, ..., .ctx, .ns) {
message("Value of x:")
print(x)
message("Value of y:")
print(y)
}
)
foo(x="Hello World!", y=rep(TRUE, 3))
Value of x:
[1] "Hello World!"
Value of y:
[1] TRUE TRUE TRUE
Additional remarks for class definitions
I prefer S4 Reference Classes over S4 Classes because of the self-referencing capabilities of S4 Reference Classes:
setRefClass(
Class="A",
fields=list(
x1="numeric",
x2="logical"
),
methods=list(
getX1=function() {
.self$x1
},
getX2=function() {
.self$x2
},
setX1=function(x) {
.self$x1 <- x
},
setX2=function(x) {
.self$field("x2", x)
},
addX1AndX2=function() {
.self$getX1() + .self$getX2()
}
)
)
x <- new("A", x1=10, x2=TRUE)
x$getX1()
[1] 10
x$getX2()
[1] TRUE
x$addX1AndX2()
[1] 11
S4 Classes don't have that feature.
Subsequent modifications of field values:
x$setX1(100)
x$addX1AndX2()
[1] 101
x$x1 <- 1000
x$addX1AndX2()
[1] 1001
Additional remarks for documenting methods and classes
I strongly recommend using packages roxygen2
and devtools
to document your methods and classes. You possibly might also want to look into package roxygen3
.
Documenting generic methods with roxygen2
:
#' Foo
#'
#' This method takes \code{x} and \code{y} and adds them.
#'
#' Some details here
#'
#' @param x \strong{Signature argument}.
#' @param y \strong{Signature argument}.
#' @param ... Further arguments to be passed to subsequent functions.
#' @param .ctx \strong{Signature argument}.
#' Application context.
#' @param .ns \strong{Signature argument}.
#' Application namespace. Usually used to distinguish different context
#' versions or configurations.
#' @author Janko Thyson \email{john.doe@@something.com}
#' @references \url{http://www.something.com/}
#' @example inst/examples/foo.R
#' @docType methods
#' @rdname foo-methods
#' @export
setGeneric(
name="foo",
signature=c("x", "y", ".ctx", ".ns"),
def=function(x, y, ..., .ctx, .ns) {
standardGeneric("foo")
}
)
Documenting methods with roxygen2
:
#' @param x \code{\link{character}}. Character vector.
#' @param y \code{\link{numeric}}. Numerical vector.
#' @param .ctx \code{\link{ApplicationContextDepartmentC}}.
#' @param .ns \code{\link{ProductionNamespace}}.
#' @return \code{\link{data.frame}}. Some data frame.
#' @rdname foo-methods
#' @aliases foo,character,numeric,missing,missing-method
#' @export
setMethod(
f="foo",
signature=signature(x="character", y="numeric",
.ctx="ApplicationContextDepartmentC", .ns="ProductionNamespace"),
definition=function(x, y, ..., .ctx, .ns) {
data.frame(x, y)
}
)