7

I lay a simple case below where I define a class "foo" over a double object, I want any arithmetic operation involving such object to strip it of its "foo" class and proceed normally.

I can partially make it work, but not robustly. see below :

library(vctrs)

x <- new_vctr(42, class = "foo")

# then this won't work (expected)
x * 2
#> Error: <foo> * <double> is not permitted

# define vec_arith method
vec_arith.foo <- function(op, x, y, ...) {
  print("we went there")
  # wrap x in vec_data to strip off the class, and forward to `vec_arith_base`
  vec_arith_base(op, vec_data(x), y)
}

# now this works  
x * 2
#> [1] "we went there"
#> [1] 84

# but this doesn't, and doesn't go through vec_arith.foo
x * data.frame(a=1)
#> Warning: Incompatible methods ("*.vctrs_vctr", "Ops.data.frame") for "*"
#> Error in x * data.frame(a = 1): non-numeric argument to binary operator

# while this works
42 * data.frame(a=1)
#>    a
#> 1 42

How can I make x * data.frame(a=1) return the same as 42 * data.frame(a=1)

traceback() doesn't return anything so I'm not sure how to debug this.

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167

1 Answers1

1

It is an intriguing question which caught my interest. I am no expert on this issue, but I found a way to get it working. It’s a rather dirty workaround and no real solution. There should be a better way to solve this issue using the {vctrs} package.

The problem is complicated, because we are dealing with an internal generic * which uses double dispatch (see here). The important part is that:

Generics in the Ops group, which includes the two-argument arithmetic and Boolean operators like - and &, implement a special type of method dispatch. They dispatch on the type of both of the arguments, which is called double dispatch.

It turns out that for a call like x * y R looks up both, this call and y * x. Then there are three possible outcomes:

The methods are the same, so it doesn’t matter which method is used.

The methods are different, and R falls back to the internal method with a warning.

One method is internal, in which case R calls the other method.

Lets keep this in mind when looking at the problem. I first refrained from using the {vctrs} package and tried to reconstruct the problem in two ways. First I tried to multiply an object of a new class with a list. This reproduces the error from the original example:

# lets create a new object
x1 <- 10
class(x1) <- "myclass"

# and multiply it with a list
l <- list(1)    
x1 * l 

# same error as in orignal example, but without warning
#> Error in x1 * l: non-numeric argument to binary operator

sloop::s3_dispatch(x1 * l)
#>    *.myclass
#>    *.default
#>    Ops.myclass
#>    Ops.default
#> => * (internal)

sloop::s3_dispatch(l * x1)
#>    *.list
#>    *.default
#>    Ops.list
#>    Ops.default
#> => * (internal)

We can see with the {sloop} package that an internal generic is called. For this generic there exists no way to use * on lists. So let's try if we can overwrite this method:

`*.myclass` <- function(x, y) {
  print("myclass")
  if (is.list(y)) {
    print("if clause")
    y <- unlist(y)
  } else {
    print("didn't use if clause")
  }
  
    x + y # to see if it's working the operation is changed
}

x1 * l # now working
#> [1] "myclass"
#> [1] "if clause"
#> [1] 11
#> attr(,"class")
#> [1] "myclass"

sloop::s3_dispatch(x1 * l)
#> => *.myclass
#>    *.default
#>    Ops.myclass
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(l * x1)
#>    *.list
#>    *.default
#>    Ops.list
#>    Ops.default
#> => * (internal)

This worked (although we really should not alter the objects in the methods call). Here we now have the third case described above: the methods are different, one is internal, so the non-internal method is called. Unlike data.frame's, list's have no existing method for arithmetic operations. So we would need an example where two objects of different class with different methods are multiplied.

# another object
y1 <- 20
class(y1) <- "another_class"

# here we still only have one method `*.myclass`:
x1 * y1 # working
#> [1] "myclass"
#> [1] "didn't use if clause"
#> [1] 30
#> attr(,"class")
#> [1] "myclass"

sloop::s3_dispatch(x1 * y1)
#> => *.myclass
#>    *.default
#>    Ops.myclass
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(y1 * x1)
#>    *.another_class
#>    *.default
#>    Ops.another_class
#>    Ops.default
#> => * (internal)

# lets introduce another method:    
`*.another_class` <- function(x, y) {
  x - y # again, to see if it is working we change the operation
}

# now we get (only) a warning, but with a different result!
x1 * y1 
#> Warning: Incompatible methods ("*.myclass", "*.another_class") for "*"
#> [1] 200
#> attr(,"class")
#> [1] "myclass"

sloop::s3_dispatch(x1 * y1)
#> => *.myclass
#>    *.default
#>    Ops.myclass
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(y1 * x1)
#> => *.another_class
#>    *.default
#>    Ops.another_class
#>    Ops.default
#>  * * (internal)

Here we now have the second case described above: the two methods are different, and R falls back to the internal method with a warning. This produces the "unaltered" result 20 * 10 = 200.

So regarding the original problem, my understanding is that we have two conflicting methods "*.vctrs_vctr" and "Ops.data.frame". For this reason, the internal method * (internal) is called, and this internal method does not allow lists or data.frames (this usually done inside Ops.data.frame which is not used, because of the conflicting methods).

library(vctrs)

z <- new_vctr(42, class = "foo")
a <- data.frame(a = 1)

z * a
#> Warning: Incompatible methods ("*.vctrs_vctr", "Ops.data.frame") for "*"
#> Error in z * a: non-numeric argument to binary operator

sloop::s3_dispatch(z * a)
#>    *.foo
#> => *.vctrs_vctr
#>    *.default
#>    Ops.foo
#>    Ops.vctrs_vctr
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(a * z)
#>    *.data.frame
#>    *.default 
#> => Ops.data.frame
#>    Ops.default
#>  * * (internal)

Here again, we can see that two different methods exist and therefore, the internal method is used.

The dirty workaround I came up with, is to:

  1. create a non-internal generic *
  2. explicitly define *.foo and
  3. explictily define *.numeric which will be called once the objects are "unclassed" with vec_data().
`*` <- function(x, y) {
  UseMethod("*")
}

`*.foo` <- function(x, y) {
  op_fn <- getExportedValue("base", "*")
  op_fn(vec_data(x),vec_data(y))
}

`*.numeric` <- function(x, y) {
  print("numeric")
  fn <- getExportedValue("base", "*")
  fn(x, y)
}

z * a
#> [1] "numeric"
#>    a
#> 1 42

sloop::s3_dispatch(z * a)
#> => *.foo
#>  * *.vctrs_vctr
#>    *.default
#>    Ops.foo
#>    Ops.vctrs_vctr
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(a * z)
#>    *.data.frame
#>    *.default
#> => Ops.data.frame
#>    Ops.default
#>  * * (internal)

Created on 2021-01-13 by the reprex package (v0.3.0)

Unfortunately, I am not 100% sure, what is happing. It seems like overriding the * generic, also overrides the way R handles double dispatch for this generic. Let's revisit the multiplication of two different type of objects x1 * y1 above. Earlier, both methods were called, and since they were different a warning was issued and the internal method was chosen. Now we observe the following:

x1 * y1 # working without warning
#> [1] "myclass"
#> [1] "didn't use if clause"
#> [1] 30
#> attr(,"class")
#> [1] "myclass"

sloop::s3_dispatch(x1 * y1)
#> => *.myclass
#>    *.default
#>    Ops.myclass
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(y1 * x1)
#> => *.another_class
#>    *.default
#>    Ops.another_class
#>    Ops.default
#>  * * (internal)

We have two conflicting methods, and still R choses the method of the first object, without issuing a warning.

This is of course not a real solution to the problem, for many reasons:

  1. Overriding the generics of arithmetic operations doesn't seem to be a good idea, since it is likely to break code.
  2. We would also need to deal with data.frame(a = 1) * z which still doesn't work (here we would need to override the existing code of Ops.data.frame.
  3. We shouldn't need to write methods for each arithmetic operation.

The {vctrs} package should help us to find a simpler and safer solution, and maybe it exists already. It might be worth opening an issue on Github.

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39
  • 1
    Thanks Tim for the insightful answer. I think the solution if it exists is to define a S4 class and implement a method for foo, data.frame, and an identical method for data.frame foo, but I haven't managed to do it. – moodymudskipper Jan 13 '21 at 00:54
  • S4 might be an option. I also thought that maybe using `foo` as a subclass and then define methods with `NextMethod()` that will fall back to `numeric` might do the trick, but I didn't get it working. The problem is probably that there is no specific `*.numeric` method, unless you define it yourself. I also still think that the {vctrs} package should have a solution for this problem. The closing sentence of Advanced R's Chp. 13 is: "if you want to implement robust double dispatch for algebraic operators, I recommend using the vctrs package". – TimTeaFan Jan 13 '21 at 21:47
  • I also wonder how others handled this type of method conflict in the past. You are probably not the first one to encounter this issue. Or do they just allow operations between `foo` and arithmetic operations with `data.frame`s are just not allowed? – TimTeaFan Jan 13 '21 at 21:50
  • In vctrs it all goes through `*.vctrs`, or something similar. vctrs cannot by itself decide that it has priority over a data frame method. A real double dispatch that is not vctrs to vctrs or vctrs to atomic needs to be S4 from my understanding. – moodymudskipper Jan 13 '21 at 22:55
  • I assume that your ultimate goal is to define arithmetic operations for `foo` which differ from the "normal" operations, so for example `x * y` does some operation which is not equal to just multiplying the two objects, right? If you just want to introduce the normal arithmetic operations for `foo` including operations with `data.frame`'s, then it might be enough to override the `Ops.data.frame` method (in this case you cant use the {vctrs} package). – TimTeaFan Jan 14 '21 at 13:08