Class systems
Check out the three class systems in R, S3, S4, and Reference classes.
## S3 methods, Section 5 of
RShowDoc("R-lang")
## S4 classes
?Classes
?Methods
## Reference classes
?ReferenceClasses
With a Java background you'll be tempted to go with reference classes, but these have 'reference semantics' and action at a distance (changing one object changes another that refers to the same data), whereas most R users expect 'copy on change' semantics. One can make great progress with S3 classes, but a more disciplined approach would in my opinion adopt S4. Features of S4 will surprise you, in part because the class system is closer to common lisp object system than to java.
There are other opinions and options.
Basic implementation
I'm not really sure what your design goal with `ProcessData' is; I would implement your two classes as a class, a generic, and a method for the generic that operates on the MyClass class.
## definition and 'low-level' constructor
.MyClass <- setClass("MyClass", representation(word="character"))
## definition of a generic
setGeneric("processData", function(x, ...) standardGeneric("processData"))
setMethod("processData", "MyClass", function(x, ...) {
cat("processData(MyClass) =", x@word, "\n")
})
This is complete and fully functional
> myClass <- .MyClass(word="hello world")
> processData(myClass)
processData(MyClass) = hello world
The three code lines might be placed in two files, "AllGenerics.R" and "MyClass.R" (including the method) or three files "AllGenerics.R", "AllClasses.R", "processData-methods.R" (note that methods are associated with generics, and dispatch on class).
Additional implementation
One would normally add a more user-friendly constructor, e.g., providing hints to the user about expected data types or performing complex argument initialization steps
MyClass <- function(word=character(), ...)
{
.MyClass(word=word, ...)
}
Typically one wants a slot accesssor, rather than direct slot access. This can be a simple function (as illustrated) or a generic + method.
word <- function(x, ...) x@word
If the slot is to be updated, then one writes a replacement function or method. The function or method usually has three arguments, the object to be updated, possible additional arguments, and the value to update the object with. Here's a generic + method implementation
setGeneric("word<-", function(x, ..., value) standardGeneric("word<-"))
setReplaceMethod("word", c("MyClass", "character"), function(x, ..., value) {
## note double dispatch on x=MyClass, value=character
x@word <- value
x
})
A somewhat tricky alternative implementation is
setReplaceMethod("word", c("MyClass", "character"), function(x, ..., value) {
initialize(x, word=value)
})
which uses the initialize
generic and default method as a copy constructor; this can be efficient if updating multiple slots at the same time.
Because the class is seen by users, one wants to display it in a user-friendly way using a 'show' method, for which a generic (getGeneric("show")
) already exists
setMethod("show", "MyClass", function(object) {
cat("class:", class(object), "\n")
cat("word:", word(object), "\n")
})
And now our user session looks like
> myClass
class: MyClass
word: hello world
> word(myClass)
[1] "hello world"
> word(myClass) <- "goodbye world"
> processData(myClass)
processData(MyClass) = goodbye world
Efficiency
R works efficiently on vectors; S4 classes are no exception. So the design is that each slot of a class represents a column spanning many rows, rather than the element of a single row. We're expecting the slot 'word' to typically contain a vector of length much greater than 1, and for operations to act on all elements of the vector. So one would write methods with this in mind, e.g., modifying the show method to
setMethod("show", "MyClass", function(object) {
cat("class:", class(object), "\n")
cat("word() length:", length(word(object)), "\n")
})
Here are larger data objects (using files on my Linux system)
> amer <- MyClass(readLines("/usr/share/dict/american-english"))
> brit <- MyClass(readLines("/usr/share/dict/british-english"))
> amer
class: MyClass
word() length: 99171
> brit
class: MyClass
word() length: 99156
> sum(word(amer) %in% word(brit))
[1] 97423
> amer_uc <- amer ## no copy, but marked to be copied if either changed
> word(amer_uc) <- toupper(word(amer_uc)) ## two distinct objects
and all of this is quite performant.
Hazards of reference class 'action-at-a-distance'
Let's rewind to a simpler implementation of the S4 class, with direct slot access and no fancy constructors. Here's the American dictionary and a copy, transformed to upper case
.MyClass <- setClass("MyClass", representation(word="character"))
amer <- .MyClass(word=readLines("/usr/share/dict/american-english"))
amer_uc <- amer
amer_uc@word <- toupper(amer_uc@word)
Note that we've upper-cased amer_uc
but not amer
:
> amer@word[99 + 1:10]
[1] "Adana" "Adar" "Adar's" "Addams" "Adderley"
[6] "Adderley's" "Addie" "Addie's" "Addison" "Adela"
> amer_uc@word[99 + 1:10]
[1] "ADANA" "ADAR" "ADAR'S" "ADDAMS" "ADDERLEY"
[6] "ADDERLEY'S" "ADDIE" "ADDIE'S" "ADDISON" "ADELA"
This is really what R users are expecting -- I've created a separate object and modified it; the original object is unmodified. This is an assertion on my part; maybe I don't know what R users expect. I'm assuming an R user isn't really paying attention to the fact that this is a reference class, but thinks it's just another R object like an integer()
vector or a data.frame
or the return value of lm()
.
In contrast, here's a minimal implementation of a reference class, and similar operations
.MyRefClass <- setRefClass("MyRefClass", fields = list(word="character"))
amer <- .MyRefClass(word=readLines("/usr/share/dict/american-english"))
amer_uc <- amer
amer_uc$word <- toupper(amer_uc$word)
But now we've changed both amer
and amer_uc
! Completely expected by C or Java programmers, but not by R users.
> amer$word[99 + 1:10]
[1] "ADANA" "ADAR" "ADAR'S" "ADDAMS" "ADDERLEY"
[6] "ADDERLEY'S" "ADDIE" "ADDIE'S" "ADDISON" "ADELA"
> amer_uc$word[99 + 1:10]
[1] "ADANA" "ADAR" "ADAR'S" "ADDAMS" "ADDERLEY"
[6] "ADDERLEY'S" "ADDIE" "ADDIE'S" "ADDISON" "ADELA"