3

I started to implement a kind of numbers in R. I have a function to add them, multiply them, etc. Now I want to do a convenient interface for the arithmetic on these numbers. That is, I don't the want the user to type multiply(x, add(y, z)), but x * (y + z) instead, etc. What is the best way to achieve this in terms of efficiency, S3 or S4? I already did such an arithmetic implementation in S4 for a package (lazyNumbers), this was a bit long, a bit "verbose". Is it more comfortable in S3? I don't know how to do with S3 yet, but I'll learn if needed.

Stéphane Laurent
  • 75,186
  • 15
  • 119
  • 225
  • So are you basically creating an algebra (in the mathematical meaning, where "set" goes to "group" goes to "ring" and so on) ? – Carl Witthoft Apr 12 '23 at 15:24
  • According to this post, S3 would be the "easiest way to override operators": https://stackoverflow.com/a/70772705/20513099 Admittedly, the technical reason is beyond me :-) – I_O Apr 12 '23 at 15:26
  • 2
    @CarlWitthoft Yes. A *field*, more precisely. – Stéphane Laurent Apr 12 '23 at 15:29
  • 4
    Here's one "how": `vec <- 99; class(vec) <- "mymath"; \`+.mymath\` <- function(a,b) { cat("hello\n"); unclass(a)+b; }; vec + 9;`. (The purposes of `unclass` is to avoid recursive calls.) As to whichever of S3/S4 is more efficient? It might depend on the complexity of your needs, but S4 allows control over more than just the LHS class. I'm assuming you've read http://adv-r.had.co.nz/OO-essentials.html? – r2evans Apr 12 '23 at 16:10
  • @I_O well, one person answered with that claim, which is not necessarily global gospel. – Carl Witthoft Apr 12 '23 at 16:13
  • (and I did not know we had tags [tag:r-s3] and [tag:r-s4], interesting) – r2evans Apr 12 '23 at 16:16
  • 1
    FWIW my impression is that `S4` has multiple dispatch capability and `S3` doesn't is the biggest difference – Carl Witthoft Apr 12 '23 at 16:24
  • 1
    @r2evans Not to be confused with `rss` and `rss2` :-) – Carl Witthoft Apr 12 '23 at 16:25
  • 1
    If you are only ever adding two objects of the same class, then S3 will suffice. But an argument for S4 even in that case is the validation mechanism. S4 classes have validation built in. – Mikael Jagan Apr 12 '23 at 17:51
  • 1
    @r2evans Sort of recent: https://meta.stackoverflow.com/questions/423825 – Mikael Jagan Apr 12 '23 at 17:52

2 Answers2

5

The answer will depend on how your "numbers" operate, but I'll try to identify the strengths and weaknesses of each approach here so you can make up your own mind.

S3

  • only checks the class() of the first argument. So if you have an object x of your class, x + 1 and 1 + x won't call the same method. (Update: apparently, members of the Ops group do consider the class of both arguments, so if there is a +.myclass or Ops.myclass function then these will still be called in the case of 1+x and x+1. However, for x+y where there are separate methods for the class of x and y, the default method is used, which will presumably fail.)
  • I believe it's quicker as there are fewer checks, but I haven't actually tested it.

S4

  • checks the class() of all arguments
  • will take more time as it has to look up the whole methods table, rather than look for a function called generic.class
  • for internal generic functions, will only look for methods if at least one of the arguments is an S4 object (shouldn't be a problem if your class is S4).
  • Checks validity of objects it creates (by default, just that the objects and slots therein have the correct class. This can be overridden if you want using setValidity (e.g. a function that always returns TRUE to skip validity checking).

Also look into the group generics Ops, Math and so on. It may be that even if you need to use S4 that you can just write methods for these. (Rememer that + and - can be unary as well as binary though, you need to make sure that the function works as intended for the case when e1 is your S4 class and e2 is missing. Depending on what sort of object your class represents, "as intended" might mean throwing an error.)

In terms of efficiency, if you are spending a long time in method dispatch rather than actual calculation then you are probably doing something wrong. In particular, consider having your class represent a vector (perhaps a list if you really need to) of whatever sort of number you are working with. Once a method has been chosen, the calculation will take the same amount of time regardless of whether we used S3 or S4, with the exception that S4 will check that the object is valid at the end. The check is typically faster than the method dispatch unless the class is very complex (i.e. has a lot of slots or a deep inheritance structure).

If by "efficiency" you simply meant not writing lots of code then group generics are the best time saver. They work with both S3 and S4.

Below is a simple example of a group generic. I've used the example of a class with two slots, x as an ordinary numeric and timestamp as the time it was calculated. We want operators to "act on the x slot" and we achieve that as follows:

## define simple class based on numeric
timestampedNum <- setClass(
  "timestampedNum",
  slots=c(timestamp="POSIXct",x="numeric"),
  prototype=prototype(timestamp=Sys.time())
)
## set methods for Ops group generic
## we need four of them:
## one for unary +, -
## one for our class [op] something else
## one for something else [op] our class
## one for our class [op] our class
setMethod(
  "Ops",
  signature = signature(e1="timestampedNum",e2="missing"),
  definition = function(e1) timestampedNum(
    x=callGeneric(e1@x),
    timestamp=Sys.time()
  )
)
setMethod(
  "Ops",
  signature = signature(e1="timestampedNum",e2="ANY"),
  definition = function(e1,e2) timestampedNum(
    x=callGeneric(e1@x,e2),
  timestamp=Sys.time()
  )
)
setMethod(
  "Ops",
  signature = signature(e1="ANY",e2="timestampedNum"),
  definition = function(e1,e2) timestampedNum(
    x=callGeneric(e1,e2@x),
    timestamp=Sys.time()
  )
)
setMethod(
  "Ops",
  signature = signature(e1="timestampedNum",e2="timestampedNum"),
  definition = function(e1,e2) timestampedNum(
    x=callGeneric(e1@x,e2@x),
  timestamp=Sys.time()
  )
)

z <- timestampedNum(x=5)
z
+z
-z
z + 1
1 + z
z + z

which produces six objects of class timestampedNum with x slots 5, 5, -5, 6, 6 and 10 respectively.

JDL
  • 1,496
  • 10
  • 18
  • *"you need to cater for the case when e1 is your S4 class and e2 is missing"*: I think you intend something like "be clear in the failure", since it will already do what is intended (i.e., **fail**). Continuing my [comment](https://stackoverflow.com/questions/75997183/implementing-an-arithmetic-system-in-r#comment134037729_75997183)-code from above), `+vec` fails with `argument "b" is missing, with no default`. I believe the message here is to look for the condition to be caught and clearly err, such as `+ggplot()` errs with `Cannot use \`+\` with a single argument`. – r2evans Apr 14 '23 at 12:08
  • Thanks JDL. If you know, could you explain the S3 way for the case `1 + x` (1 numeric, `x` my class). This is the point I don't understand. – Stéphane Laurent Apr 14 '23 at 12:35
  • That's the drawback --- there is no way to do that with S3. You have to live with `1+x` being different to `x+1` (or probably not working at all). – JDL Apr 14 '23 at 15:22
  • @r2evans --- not what I meant at all! You can write an S3 method that starts `if(nargs()==1) unclass(a) else unclass(a)+b`. – JDL Apr 14 '23 at 15:39
  • That's interesting, though it is of course contextual, since `+ggplot()` has no nothing of a super-class addition, so it should fail. I see your point, though. – r2evans Apr 14 '23 at 15:43
  • 1
    @r2evans --- I think I meant to say "make sure it produces the intended result". In some cases like `+ggplot()` that will indeed be an error as it makes no sense. For algebraic objects like the OP was discussing, `-x` and `+x` will often be well defined and useful, even if `+x == x`. – JDL Apr 14 '23 at 15:46
  • 2
    "`x + 1` and `1 + x` won't call the same method" ... that is simply wrong. `?groupGeneric` clearly states that `+` dispatches on the class attribute of both arguments even in the S3 case (i.e., when neither argument is an S4 object). – Mikael Jagan Apr 19 '23 at 15:18
  • Fair enough --- hadn't come across that exception, which makes total sense. S3 is probably more tractable as a result. – JDL Apr 19 '23 at 15:44
1

Just to elaborate on my comment ...

x <- structure(0, class = "zzz")

.S3method("Ops", "zzz",
          function(e1, e2) {
              if (missing(e1))
                  "A" # should never happen
              else if (missing(e2))
                  "B"
              else if (!inherits(e2, "zzz"))
                  "C"
              else if (!inherits(e1, "zzz"))
                  "D"
              else "E"
          })

+x
## [1] "B"
x + 1
## [1] "C"
1 + x
## [1] "D"
x + x
## [1] "E"
Mikael Jagan
  • 9,012
  • 2
  • 17
  • 48