dplyr compatibility issue with custom S3 classes

Question

I wrote a small R package, called errors, that associates errors to numeric vectors and enables transparent error propagation. I am struggling to make it fully compatible with dplyr.

First, let's take the well-known iris dataset and assign a 5% of error to every numerical variable:

library(errors)
library(dplyr)

iris_errors <- iris %>%
  mutate_at(vars(-Species), funs(set_errors(., .*0.05)))

head(iris_errors)                         # ok

Every column is an errors S3 object, with 150 values and their associated 150 errors:

length(iris_errors$Sepal.Length)          # 150
length(errors(iris_errors$Sepal.Length))  # 150

Now, let's say we want the average for each column by species:

iris_mean <- iris_errors %>%
  group_by(Species) %>%
  summarise_all(mean)

head(iris_mean)                           # error

Apparently, it works, but the formatter fails when we try to print it. What happened is that, at some point, we lost all the errors but the first one:

length(iris_mean$Sepal.Length)            # 3
length(errors(iris_mean$Sepal.Length))    # 1!

Did you define a subsetting operator for your class? It would be nicer if you provided a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in the question itself that didn't rely on installing the package to debug, — MrFlick, May 09 '17 at 20:31
Yes, I defined the subsetting operators. The same example, without the package, would require many functions defined in the package, so I don't see the point of it. Moreover, it's on CRAN and it has no dependencies. — Iñaki Úcar, May 09 '17 at 22:41

score 1 · Answer 1 · answered May 15 '17 at 09:53

It seems that working with attributes in dplyr is not a trivial issue (see tidyverse/dplyr#2773). They have plans for this, but it won't work for the time being.

This is a pure-R version of the grouped summary I was trying to do with dplyr:

by(iris_errors, iris_errors$Species, function(i) {
  i$Species <- NULL
  lapply(i, mean)
})

dplyr compatibility issue with custom S3 classes

1 Answers1