1

I would like to get the mean of a variable according to the group it belongs to. Here is a reproducible example.

gender <- c("M","F","M","F")
vec1 <- c(1:4)
vec2 <- c(10:13)

df <- data.frame(vec1,vec2,gender)
variables <- names(df)
variables <- variables[-3]
#Wished result
mean1 <- c(mean(c(1,3)),mean(c(2,4)))
mean2 <- c(mean(c(10,12)),mean(c(11,13)))
gender <- c("M","F") 
result <- data.frame(gender,mean1,mean2)

How can I achieved such a result ? I would like to use the vector variables, containing the names of the variables to be summarized instead of writing each variables, as my dataset is quite big.

outofthegreen
  • 352
  • 1
  • 13

3 Answers3

1

Using aggregate.

## formula notation
aggregate(cbind(vec1, vec2) ~ gender, df, FUN=mean)
#   gender vec1 vec2
# 1      F    3   12
# 2      M    2   11

## list notation
with(df, aggregate(list(mean=cbind(vec1, vec2)), list(gender=gender), mean))
#   gender mean.vec1 mean.vec2
# 1      F         3        12
# 2      M         2        11

If you get an error in the formula notation, it is because you have named another object mean. Use rm(mean) in this case.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Thank you for your swift answer. In my case, I have several variables, several X of whose I wish to extract the means. I have a vector containing the names of all these variables (namely "variables.vector" but I cannot use "aggregate(list(mean=variables.vector)". Do you know a workaround so I don't have to manually write each column name of which I wish to take the means ? Also, aggregate does not seems to keep the names of the variables but instead call them generically : "mean", "mean.1", "mean.2" etc. How can it be solved ? Thanks! – outofthegreen Oct 10 '20 at 09:26
  • @JeandeLéry You may want to update your example to make it more similar to your real data. – jay.sf Oct 10 '20 at 09:29
  • 1
    I have edited the data.frame, is my request clearer ? Is there a more elegant way to use summarise for several variables than writing them all manually ? – outofthegreen Oct 10 '20 at 09:57
  • @JeandeLéry Yes, see update pls. – jay.sf Oct 10 '20 at 10:09
1

Use library dplyr

library(dplyr) 

gender <- c("M","F","M","F")
df <- data.frame(1:4,gender)

df %>% 
  group_by(gender) %>% 
  summarise(mean = X1.4 %>% mean())
  • Thakns for your answer, do you mind checking the update of my example ? I wish to summarize several variables using a vector containing the names of all these variables. – outofthegreen Oct 10 '20 at 09:58
1

A dplyr solution

library(dplyr)
df %>% group_by(gender) %>% summarise(across(variables, list(mean = mean), .names = "{.fn}_{.col}"))

Output

# A tibble: 2 x 3
  gender mean_vec1 mean_vec2
  <chr>      <dbl>     <dbl>
1 F              3        12
2 M              2        11
ekoam
  • 8,744
  • 1
  • 9
  • 22