Summarize a Variable by All But Group

Question

I have a data.frame and I need to calculate the mean per "anti-group" (i.e. per Name, below).

Name     Month  Rate1     Rate2
Aira       1      12        23
Aira       2      18        73
Aira       3      19        45
Ben        1      53        19
Ben        2      22        87
Ben        3      19        45
Cat        1      22        87
Cat        2      67        43
Cat        3      45        32

My desired output is like below, where the values for Rate1 and Rate2 are the means of the column's values not found in each group. Please disregard the value, I have made it up for the example. I'd prefer to do this using dplyr if possible.

Name    Rate1   Rate2
Aira    38      52.2
Ben     30.5    50.5
Cat     23.8    48.7

Any help much appreciated! Thank you!

PS - Thanks to Ianthe for copying their question and their question's data but changing the question slightly. (Mean per group in a data.frame)

what did you try? You are tagging `dplyr` so did you try `summarise_all`, `summarise_at`, etc...? — Sotos, Feb 10 '20 at 15:25
Please make an earnest attempt so we can help with implementation and helps your learning curve. — Parfait, Feb 10 '20 at 15:26
Well if I wanted per group, I could easily do the following: `df %>% group_by(Name) %>% summarize(Rate1=mean(Rate1), Rate2=mean(Rate2))` but that calculates the mean of the Rate columns by group. I want to calculate the mean of the rate columns by everything **but** the group. — tubaguy, Feb 10 '20 at 15:32

score 2 · Answer 1 · answered Feb 10 '20 at 15:42

Here is another idea via base R,

do.call(rbind, lapply(unique(df$Name), function(i)colMeans(df[!df$Name %in% i,-c(1:2)])))

#        Rate1    Rate2
#[1,] 38.00000 52.16667
#[2,] 30.50000 50.50000
#[3,] 23.83333 48.66667

or complete with Name,

cbind.data.frame(Name = unique(df$Name), res1)

#  Name    Rate1    Rate2
#1 Aira 38.00000 52.16667
#2  Ben 30.50000 50.50000
#3  Cat 23.83333 48.66667

score 1 · Answer 2 · answered Feb 10 '20 at 15:30

library(tidyverse)

# exampel dataset
df = read.table(text = "
Name     Month  Rate1     Rate2
Aira       1      12        23
Aira       2      18        73
Aira       3      19        45
Ben        1      53        19
Ben        2      22        87
Ben        3      19        45
Cat        1      22        87
Cat        2      67        43
Cat        3      45        32
", header=T, stringsAsFactors=F)

# function that returns means of Rates after excluding a given name
AntiGroupMean = function(x) { df %>% filter(Name != x) %>% summarise_at(vars(matches("Rate")), mean) }

df %>%
  distinct(Name) %>%                         # for each name
  mutate(v = map(Name, AntiGroupMean)) %>%   # apply the function
  unnest(v)                                  # unnest results

# # A tibble: 3 x 3
#   Name  Rate1 Rate2
#   <chr> <dbl> <dbl>
# 1 Aira   38    52.2
# 2 Ben    30.5  50.5
# 3 Cat    23.8  48.7

score 1 · Accepted Answer · answered Feb 10 '20 at 15:30

1

One option could be:

df %>%
 mutate_at(vars(Rate1, Rate2), list(sum = ~ sum(.))) %>%
 mutate(rows = n()) %>%
 group_by(Name) %>%
 summarise(Rate1 = first((Rate1_sum - sum(Rate1))/(rows-n())),
           Rate2 = first((Rate2_sum - sum(Rate2))/(rows-n())))

  Name  Rate1 Rate2
  <chr> <dbl> <dbl>
1 Aira   38    52.2
2 Ben    30.5  50.5
3 Cat    23.8  48.7

Or in a less tidy form:

df %>%
 group_by(Name) %>%
 summarise(Rate1 = first((sum(df$Rate1) - sum(Rate1))/(nrow(df)-n())),
           Rate2 = first((sum(df$Rate2) - sum(Rate2))/(nrow(df)-n())))

answered Feb 10 '20 at 15:30

tmfmnk

38,881
4
47
67

Thanks! I think your solution is cleanest and most generalize-able. Thanks again. – tubaguy Feb 10 '20 at 15:43
It could be slightly problematic when NAs are present, but it could be adjusted :) – tmfmnk Feb 10 '20 at 15:44

IceCreamToucan · Answer 4 · 2020-02-10T16:10:53.147

You could compute this as a mean of the group means, weighted by the number of observations in each group, but with the given row's weight equal to 0.

library(dplyr)

df %>% 
  group_by(Name) %>% 
  summarise(n = n(), Rate1 = mean(Rate1), Rate2 = mean(Rate2)) %>% 
  mutate_at(vars(starts_with('Rate')),  ~
    sapply(Name, function(x) weighted.mean(.x, n*(Name != x))))

# A tibble: 3 x 4
  Name      n Rate1 Rate2
  <chr> <int> <dbl> <dbl>
1 Aira      3  38    52.2
2 Ben       3  30.5  50.5
3 Cat       3  23.8  48.7

akrun · Answer 5 · 2020-02-10T15:49:49.397

We can use

library(dplyr)
library(purrr)
map_dfr(unique(df1$Name), ~ 
   anti_join(df1, tibble(Name = .x)) %>% 
   summarise_at(vars(starts_with('Rate')), mean) %>%
   mutate(Name = .x)) %>%
   select(Name, everything())
#    Name    Rate1    Rate2
#1 Aira 38.00000 52.16667
#2  Ben 30.50000 50.50000
#3  Cat 23.83333 48.66667

data

df1 <- structure(list(Name = c("Aira", "Aira", "Aira", "Ben", "Ben", 
"Ben", "Cat", "Cat", "Cat"), Month = c(1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L), Rate1 = c(12L, 18L, 19L, 53L, 22L, 19L, 22L, 67L, 
45L), Rate2 = c(23L, 73L, 45L, 19L, 87L, 45L, 87L, 43L, 32L)), 
 class = "data.frame", row.names = c(NA, 
-9L))

Sorry, the tibble this code creates doesn't match my desired solution. — tubaguy, Feb 10 '20 at 15:38

arg0naut91 · Answer 6 · 2020-02-10T15:46:37.730

You could try:

library(dplyr)

df %>%
  mutate_at(
    vars(contains('Rate')),
    ~ sapply(1:n(), function(x) mean(.[Name %in% setdiff(unique(df$Name), Name[x])], na.rm = TRUE)
             )
  ) %>%
  distinct_at(vars(-Month))

Output:

  Name    Rate1    Rate2
1 Aira 38.00000 52.16667
2  Ben 30.50000 50.50000
3  Cat 23.83333 48.66667

(Though you are likely much better off with other solutions as sapply through rows would be REALLY slow on larger datasets)

Summarize a Variable by All But Group

6 Answers6

data