How to replace values for similar values by mean for all columns?

Question

I have this data frame here

 df = structure(list(D = c(-76, -74, -72, -70, -44, -42), A = c(83, 
 83, 82, 82, 81, 81), B = c(-0.613, -0.4,-0.5, -0.68, -0.13, -0.26)), row.names = 
c(NA, 6L), class = "data.frame")

I would like to compute the mean of all values in B that have similar value in A.

for instance -0.613 and -0.4 as they correspond to the same values 83 etc

I can simply do this :

   df$Bmean <- with(df, ave(B, A))

However this only for B. I need to do the same thing for all columns (B,D,etc.) in df

akrun · Accepted Answer · 2020-07-06T22:39:09.227

We can use mutate with across from dplyr for multiple columns

library(dplyr) # 1.0.0
df %>% 
   group_by(A) %>%
   mutate(across(everything(), list(mean = ~ mean(.))))

If it is to replace original column with mean

df %>%
   group_by(A) %>%
   mutate(across(everything(), mean, na.rm = TRUE))

NOTE: na.rm = TRUE is added in case there are any NA values as by default it is na.rm = FALSE

Or to have fine control over the column names

df1 <- df %>% 
         group_by(A) %>%
         mutate(across(everything(), list(mean = ~ mean(.)), .names = "{col}mean"))
df1
# A tibble: 6 x 5
# Groups:   A [3]
#      D     A      B Dmean  Bmean
#  <dbl> <dbl>  <dbl> <dbl>  <dbl>
#1   -76    83 -0.613   -75 -0.506
#2   -74    83 -0.4     -75 -0.506
#3   -72    82 -0.5     -71 -0.59 
#4   -70    82 -0.68    -71 -0.59 
#5   -44    81 -0.13    -43 -0.195
#6   -42    81 -0.26    -43 -0.195

Or using ave for multiple columns, get the vector of column names that are not the grouping ("A" with setdiff ('nm1'), Loop over the vector, subset the dataset column, use that in ave and assign it back to the dataset as new columns with paste

nm1 <- setdiff(names(df), "A")
df[paste0(nm1, "mean")] <- lapply(nm1, function(nm)  ave(df[[nm]], df$A))

Thanks. I would like to replace the original columns by the new ones (mean) — Tpellirn, Jul 06 '20 at 22:37
@Tpellirn If it is toreplace original, just do `df %>% group_by(A) %>% mutate(across(everything(), mean))` — akrun, Jul 06 '20 at 22:38
@Tpellirn I used `dplyr 1.0.0` `across` is from that version — akrun, Jul 06 '20 at 22:49
@Tpellirn if your dplyr version is less than 1.0.0, then the answer in the other post should work for you. In the future, those functions with suffix `_at/_all` etc will be deprecated though — akrun, Jul 06 '20 at 22:52

score 1 · Answer 2 · edited Jul 06 '20 at 22:39

You could use this approach

library(dplyr)
#Approach 1
df %>% group_by(A) %>% mutate_all(mean,na.rm=T)

# A tibble: 6 x 3
# Groups:   A [3]
      D     A      B
  <dbl> <dbl>  <dbl>
1   -75    83 -0.506
2   -75    83 -0.506
3   -71    82 -0.59 
4   -71    82 -0.59 
5   -43    81 -0.195
6   -43    81 -0.195

#Approach 2
df %>% group_by(A) %>% summarise_all(mean,na.rm=T)

# A tibble: 3 x 3
      A     D      B
  <dbl> <dbl>  <dbl>
1    81   -43 -0.195
2    82   -71 -0.59 
3    83   -75 -0.506

How to replace values for similar values by mean for all columns?

2 Answers2