Taking column means as part of a function

Question

I have the following function:

estimate = function(df, y_true) {
        
        R = nrow(df)
        
        y_estimated = apply(df, 2, mean)
        
        ((sqrt( (y_estimated - y_true)^2 / R)) / y_true) * 100
}


df = iris[1:10,2:4]
y_true = c(3, 1, 0.4)
estimate(df = df, y_true = y_true)

user:bird provided this and works great, however, I also need to find the means by group. So if we change the df to df= iris[,2:5], how to do I find the means of each column by Species to use in the function. I figured something like this would work- but not luck:

estimate = function(df, y_true, group) {
  
  R = nrow(df)
  
  y_estimated = df %>% group_by(group) %>% apply(df, 2, mean)
  
  ((sqrt( (y_estimated - y_true)^2 / R)) / y_true) * 100
}



df = iris[2:5]
y_true = c(3, 1, 0.4)
group=df$Species 

estimate(df = df, y_true = y_true, group=group)

Using colMeans also did not work.

This is an extension of this post which explains the purpose of each variable.

Does [calculate mean by group](https://stackoverflow.com/q/11562656/3358272) answer your question? — r2evans, Feb 24 '22 at 18:44
Many of those options work fine outside the function. I think the issue I was having was how to denote the groups and the specific columns within the function so it can be applied to multiple datasets. Ex. I can aggregate specific columns, but then I have to change the function to specify the correct columns each time I get a new dataset. I think @IceCreamToucan's answer resolves that though. Thanks! — hugh_man, Feb 24 '22 at 19:03

score 2 · Accepted Answer · answered Feb 24 '22 at 18:58

Rather than modifying your function, you can keep the function as-is and apply it group-wise to your data. If you use group_by and then group_modify, the input to the function you pass to group_modify is the data frame, subset to the rows in that specific group.

estimate = function(df, y_true) {
        
        R = nrow(df)
        
        y_estimated = apply(df, 2, mean)
        
        ((sqrt( (y_estimated - y_true)^2 / R)) / y_true) * 100
}


df = iris[2:5]
y_true = c(3, 1, 0.4)

library(dplyr, warn.conflicts = FALSE)

df %>% 
  group_by(Species) %>% 
  group_modify(~ as.data.frame.list(estimate(., y_true)))
#> # A tibble: 3 × 4
#> # Groups:   Species [3]
#>   Species    Sepal.Width Petal.Length Petal.Width
#>   <fct>            <dbl>        <dbl>       <dbl>
#> 1 setosa           2.02          6.53        5.44
#> 2 versicolor       1.08         46.1        32.7 
#> 3 virginica        0.123        64.4        57.5

^{Created on 2022-02-24 by the reprex package (v2.0.1)}

Interesting! I (obviously) didn't think of that. Thanks for your help! — hugh_man, Feb 24 '22 at 19:07

Taking column means as part of a function

1 Answers1