Calculate group mean while excluding current observation using dplyr

Question

Using dplyr (preferably), I am trying to calculate the group mean for each observation while excluding that observation from the group.

It seems that this should be doable with a combination of rowwise() and group_by(), but both functions cannot be used simultaneously.

Given this data frame:

df <- data_frame(grouping = rep(LETTERS[1:5], 3),
                 value = 1:15) %>%
  arrange(grouping)
df
#> Source: local data frame [15 x 2]
#> 
#>    grouping value
#>       (chr) (int)
#> 1         A     1
#> 2         A     6
#> 3         A    11
#> 4         B     2
#> 5         B     7
#> 6         B    12
#> 7         C     3
#> 8         C     8
#> 9         C    13
#> 10        D     4
#> 11        D     9
#> 12        D    14
#> 13        E     5
#> 14        E    10
#> 15        E    15

I'd like to get the group mean for each observation with that observation excluded from the group, resulting in:

#>    grouping value special_mean
#>       (chr) (int)
#> 1         A     1          8.5  # i.e. (6 + 11) / 2
#> 2         A     6            6  # i.e. (1 + 11) / 2
#> 3         A    11          3.5  # i.e. (1 + 6) / 2
#> 4         B     2          9.5
#> 5         B     7            7
#> 6         B    12          4.5
#> 7         C     3          ...

I've attempted nesting rowwise() inside a function called by do(), but haven't gotten it to work, along these lines:

special_avg <- function(chunk) {
  chunk %>%
    rowwise() #%>%
    # filter or something...?
}

df %>%
  group_by(grouping) %>%
  do(special_avg(.))

At the risk of being really "quick and dirty", this may work for you (but isn't necessarily a general approach) `df %>% group_by(grouping) %>% mutate(avg = (sum(value)-value)/(n()-1))` — steveb, Mar 08 '16 at 04:47
If you need a special function, `combn` can be useful `df %>% group_by(grouping) %>% mutate(avg = combn(value, n()-1, FUN=mean))` — akrun, Mar 08 '16 at 06:12
@akrun, thanks for introducing `combn` but after playing with it trying to solve [this question](http://stackoverflow.com/questions/36450278/summarize-with-dplyr-other-then-groups/36463525#36463525) I found out that you have to reverse the vector in order to omit the current observation `combn(rev(value), n()-1, FUN = mean)` — donlelek, Apr 06 '16 at 23:04

mtoto · Accepted Answer · 2016-03-08T04:54:48.520

17

No need to define a custom function, instead we could simply sum all elements of the group, subtract the current value, and divide by number of elements per group minus 1.

df %>% group_by(grouping) %>%
        mutate(special_mean = (sum(value) - value)/(n()-1))
#   grouping value special_mean
#      (chr) (int)        (dbl)
#1         A     1          8.5
#2         A     6          6.0
#3         A    11          3.5
#4         B     2          9.5
#5         B     7          7.0

edited Mar 08 '16 at 04:54

answered Mar 08 '16 at 04:49

mtoto

23,919
4
58
71

1

Nice—I like the math-based approach way better than futzing around with the data and extra functions. I replaced the group size with `(n() - 1)` so it's more dynamic. – Andrew Mar 08 '16 at 04:52
3

nice, but what if we want to use another function, say `max`? – jlesuffleur May 06 '19 at 10:20
1

@jlesuffleur Perhaps, [this solution](https://stackoverflow.com/a/68920876/3817004) may answer your question. – Uwe Aug 25 '21 at 10:11

score 4 · Answer 2 · answered Aug 25 '21 at 10:08

I came across this old question just by chance and I wondered if there is a general solution which would work for other aggregation functions besides mean() as well, e.g., max() as requested by jlesuffleur or median().

The idea is to omit the actual row from computing the aggregate by looping over the rows within the actual group:

library(dplyr)
df %>% 
  group_by(grouping) %>% 
  mutate(special_mean = sapply(1:n(), function(i) mean(value[-i])))

   grouping value special_mean
   <chr>    <int>        <dbl>
 1 A            1          8.5
 2 A            6          6  
 3 A           11          3.5
 4 B            2          9.5
 5 B            7          7  
...

This will work for max() as well

df %>% 
  group_by(grouping) %>% 
  mutate(special_max = sapply(1:n(), \(i) max(value[-i])))

   grouping value special_max
   <chr>    <int>       <int>
 1 A            1          11
 2 A            6          11
 3 A           11           6
 4 B            2          12
 5 B            7          12
 6 B           12           7
...

For the sake of completeness, here is also a data.table solution:

library(data.table)
setDT(df)[, special_mean := sapply(1:.N, function(i) mean(value[-i])), by = grouping][]

Nice with a more general solution, but perhaps consider to add a `data.table` solution also for the "special case" of `mean` (`df[, mean2 := (sum(value) - value) / (.N - 1), by = grouping]`) (it will be a good dupe-target for leave-one-out stuff). Cheers — Henrik, Sep 30 '21 at 12:17

Calculate group mean while excluding current observation using dplyr

2 Answers2

Linked

Related