I wrote a small R package, called errors
, that associates errors to numeric vectors and enables transparent error propagation. I am struggling to make it fully compatible with dplyr
.
First, let's take the well-known iris
dataset and assign a 5% of error to every numerical variable:
library(errors)
library(dplyr)
iris_errors <- iris %>%
mutate_at(vars(-Species), funs(set_errors(., .*0.05)))
head(iris_errors) # ok
Every column is an errors
S3 object, with 150 values and their associated 150 errors:
length(iris_errors$Sepal.Length) # 150
length(errors(iris_errors$Sepal.Length)) # 150
Now, let's say we want the average for each column by species:
iris_mean <- iris_errors %>%
group_by(Species) %>%
summarise_all(mean)
head(iris_mean) # error
Apparently, it works, but the formatter fails when we try to print it. What happened is that, at some point, we lost all the errors but the first one:
length(iris_mean$Sepal.Length) # 3
length(errors(iris_mean$Sepal.Length)) # 1!