2

I'd like to create an S3 class that extends data.table by adding attributes that would be used by other methods of that class. In the example below I'm adding an attribute colMeas that holds the name of the column with the measurement:

library(data.table)

myclass <- function(dt, colMeas) {

  stopifnot(data.table::is.data.table(dt))

  data.table::setattr(dt, "colMeas", colMeas)
  data.table::setattr(dt, "class", union("myclass", class(dt)))

}

is.myclass <- function(obj) inherits(obj, "myclass")

I have a method that modifies the existing measurement column:

modCol <- function(obj, arg) {
  UseMethod("modCol")
}

# Modify the existing column
modCol.myclass <- function(obj, arg) {

  stopifnot(is.myclass(obj))
  stopifnot(is.numeric(arg))

  colMeas <- attr(obj, "colMeas")

  obj[,
      (colMeas) := get(colMeas) + arg]
}

And a method that adds a new column:

addCol <- function(obj, arg) {
  UseMethod("addCol")
}

# Add a column
addCol.myclass <- function(obj, arg) {

  stopifnot(is.myclass(obj))
  stopifnot(is.numeric(arg))

  colMeas <- attr(obj, "colMeas")

  obj[,
      colNew := get(colMeas) + arg]

  data.table::setattr(obj, "colNew", "colNew")
}

I'm using everything as follows:

library(data.table)
dt = data.table(x = 1:10,
                y = rep(1, 10))
myclass(dt, colMeas = "y")


modCol(dt, 10)
addCol(dt, 10)

Which gives:

> dt
     x  y colNew
 1:  1 11     21
 2:  2 11     21
 3:  3 11     21
 4:  4 11     21
 5:  5 11     21
 6:  6 11     21
 7:  7 11     21
 8:  8 11     21
 9:  9 11     21
10: 10 11     21

> attributes(dt)
$names
[1] "x"      "y"      "colNew"

$row.names
 [1]  1  2  3  4  5  6  7  8  9 10

$class
[1] "myclass"    "data.table" "data.frame"

$.internal.selfref
<pointer: 0x7f841e016ee0>

$colMeas
[1] "y"

$colNew
[1] "colNew"

The question is more about the R/S3 "doctrine". In the methods above I'm modifying the data.table object "in-place" and I can call these functions without assigning results to new objects. Is this a correct way of handling data.table objects in S3 classes? Or should I add explicit return(obj) to all functions and then assign the results like so:

dt = myclass(dt, colMeas = "y")
    
dt = modCol(dt, 10)
dt = addCol(dt, 10)

Wouldn't that lead to an excessive copying of the dt object?

mattek
  • 903
  • 1
  • 6
  • 18

2 Answers2

2

I would vote Yes to modify it in place, that is, do not make it necessary to catch the returned value.

(I changed my mind twice during thinking about this reply, but now I'm sure).

There are several function in the data.table that modify objects in place, setnames(...) for example. There is clear precedence for this.

There is also a general phiolosophy in the data.table code base to work by reference, it is an important feature that sets it apart from data.frames

Playing into this design philosophy sounds like the rigth thing to do.

Note: I think it's still nice to invisibly return the data.table object.

Sirius
  • 5,224
  • 2
  • 14
  • 21
  • Thanks! What do you mean by invisibly returning the object? The explicit `return` at the end of the function? – mattek Mar 23 '21 at 23:52
  • 1
    Oops, I blanked. You meant to use `invisible(obj)` instead of `return(obj)`. – mattek Mar 24 '21 at 00:05
  • 1
    see the source code of `data.table::setnames` , the function body ends with `invisible(x)`. It is a special operation that temporarily makes x invisible, in the sense that it is not printed after returned. You can stilll assign it somewher, and you can still call eg. `print( setnames(...) )` to have it print anyway, but its a nice gesture when its not necessary to pick the return value up. It means you can do setnames(x) without cluttering your terminal and still chain it should you want to. – Sirius Mar 24 '21 at 00:07
2

Your code seems to rely on an (AFAIK) undocumented property of union(), that it is order-preserving. You'll want "myclass" to appear first in the class list in order to dispatch accurately. So I'd suggest you update your class list with an append() rather than a union():

data.table::setattr(dt, "class", append(c("myclass"), class(dt)))

  • 1
    Also, since you're apparently a defensive programmer, I'd suggest you include `stopifnot(data.table:::selfrefok(obj)==1)` in the body of your `is.myclass()` method. This will give you warning of a copying of the `data.table` if its reference had been passed to a non-`data.table` method which -- for whatever reason -- has copied it. The copied `data.table` will have all of its members (AFAIK), but it is hazardous because it can't be updated reliably by `data.table`. See https://stackoverflow.com/questions/74831040/does-data-table-export-a-method-for-testing-its-internal-selfref-sentinel-if-n – Clark Thomborson Dec 19 '22 at 18:42