4

I have a data.frame and I need to calculate the mean per "anti-group" (i.e. per Name, below).

Name     Month  Rate1     Rate2
Aira       1      12        23
Aira       2      18        73
Aira       3      19        45
Ben        1      53        19
Ben        2      22        87
Ben        3      19        45
Cat        1      22        87
Cat        2      67        43
Cat        3      45        32

My desired output is like below, where the values for Rate1 and Rate2 are the means of the column's values not found in each group. Please disregard the value, I have made it up for the example. I'd prefer to do this using dplyr if possible.

Name    Rate1   Rate2
Aira    38      52.2
Ben     30.5    50.5
Cat     23.8    48.7

Any help much appreciated! Thank you!

PS - Thanks to Ianthe for copying their question and their question's data but changing the question slightly. (Mean per group in a data.frame)

tubaguy
  • 149
  • 11
  • 2
    what did you try? You are tagging `dplyr` so did you try `summarise_all`, `summarise_at`, etc...? – Sotos Feb 10 '20 at 15:25
  • 1
    Please make an earnest attempt so we can help with implementation and helps your learning curve. – Parfait Feb 10 '20 at 15:26
  • Well if I wanted per group, I could easily do the following: `df %>% group_by(Name) %>% summarize(Rate1=mean(Rate1), Rate2=mean(Rate2))` but that calculates the mean of the Rate columns by group. I want to calculate the mean of the rate columns by everything **but** the group. – tubaguy Feb 10 '20 at 15:32

6 Answers6

2

Here is another idea via base R,

do.call(rbind, lapply(unique(df$Name), function(i)colMeans(df[!df$Name %in% i,-c(1:2)])))

#        Rate1    Rate2
#[1,] 38.00000 52.16667
#[2,] 30.50000 50.50000
#[3,] 23.83333 48.66667

or complete with Name,

cbind.data.frame(Name = unique(df$Name), res1)

#  Name    Rate1    Rate2
#1 Aira 38.00000 52.16667
#2  Ben 30.50000 50.50000
#3  Cat 23.83333 48.66667
Sotos
  • 51,121
  • 6
  • 32
  • 66
1
library(tidyverse)

# exampel dataset
df = read.table(text = "
Name     Month  Rate1     Rate2
Aira       1      12        23
Aira       2      18        73
Aira       3      19        45
Ben        1      53        19
Ben        2      22        87
Ben        3      19        45
Cat        1      22        87
Cat        2      67        43
Cat        3      45        32
", header=T, stringsAsFactors=F)

# function that returns means of Rates after excluding a given name
AntiGroupMean = function(x) { df %>% filter(Name != x) %>% summarise_at(vars(matches("Rate")), mean) }

df %>%
  distinct(Name) %>%                         # for each name
  mutate(v = map(Name, AntiGroupMean)) %>%   # apply the function
  unnest(v)                                  # unnest results

# # A tibble: 3 x 3
#   Name  Rate1 Rate2
#   <chr> <dbl> <dbl>
# 1 Aira   38    52.2
# 2 Ben    30.5  50.5
# 3 Cat    23.8  48.7
AntoniosK
  • 15,991
  • 2
  • 19
  • 32
1

One option could be:

df %>%
 mutate_at(vars(Rate1, Rate2), list(sum = ~ sum(.))) %>%
 mutate(rows = n()) %>%
 group_by(Name) %>%
 summarise(Rate1 = first((Rate1_sum - sum(Rate1))/(rows-n())),
           Rate2 = first((Rate2_sum - sum(Rate2))/(rows-n())))

  Name  Rate1 Rate2
  <chr> <dbl> <dbl>
1 Aira   38    52.2
2 Ben    30.5  50.5
3 Cat    23.8  48.7

Or in a less tidy form:

df %>%
 group_by(Name) %>%
 summarise(Rate1 = first((sum(df$Rate1) - sum(Rate1))/(nrow(df)-n())),
           Rate2 = first((sum(df$Rate2) - sum(Rate2))/(nrow(df)-n())))
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
1

You could compute this as a mean of the group means, weighted by the number of observations in each group, but with the given row's weight equal to 0.

library(dplyr)

df %>% 
  group_by(Name) %>% 
  summarise(n = n(), Rate1 = mean(Rate1), Rate2 = mean(Rate2)) %>% 
  mutate_at(vars(starts_with('Rate')),  ~
    sapply(Name, function(x) weighted.mean(.x, n*(Name != x))))

# A tibble: 3 x 4
  Name      n Rate1 Rate2
  <chr> <int> <dbl> <dbl>
1 Aira      3  38    52.2
2 Ben       3  30.5  50.5
3 Cat       3  23.8  48.7
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
0

We can use

library(dplyr)
library(purrr)
map_dfr(unique(df1$Name), ~ 
   anti_join(df1, tibble(Name = .x)) %>% 
   summarise_at(vars(starts_with('Rate')), mean) %>%
   mutate(Name = .x)) %>%
   select(Name, everything())
#    Name    Rate1    Rate2
#1 Aira 38.00000 52.16667
#2  Ben 30.50000 50.50000
#3  Cat 23.83333 48.66667

data

df1 <- structure(list(Name = c("Aira", "Aira", "Aira", "Ben", "Ben", 
"Ben", "Cat", "Cat", "Cat"), Month = c(1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L), Rate1 = c(12L, 18L, 19L, 53L, 22L, 19L, 22L, 67L, 
45L), Rate2 = c(23L, 73L, 45L, 19L, 87L, 45L, 87L, 43L, 32L)), 
 class = "data.frame", row.names = c(NA, 
-9L))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

You could try:

library(dplyr)

df %>%
  mutate_at(
    vars(contains('Rate')),
    ~ sapply(1:n(), function(x) mean(.[Name %in% setdiff(unique(df$Name), Name[x])], na.rm = TRUE)
             )
  ) %>%
  distinct_at(vars(-Month))

Output:

  Name    Rate1    Rate2
1 Aira 38.00000 52.16667
2  Ben 30.50000 50.50000
3  Cat 23.83333 48.66667

(Though you are likely much better off with other solutions as sapply through rows would be REALLY slow on larger datasets)

arg0naut91
  • 14,574
  • 2
  • 17
  • 38