-1

Consider the following dataset:

a <- c(1,23,18,47,15,56,67,43,9)
b <- c("A","B","B","C","C","B","D","A","C")
df <- data.frame(var1=a, var2=b)

I need to run function (for example mean()) on sub parts of df (based on var2 value), like this:

df_A <- subset(df,var2=="A")
mean_A <- mean(df_A$var1)

df_B <- subset(df,var2=="B")
mean_B <- mean(df_B$var1)

df_C <- subset(df,var2=="C")
mean_C <- mean(df_C$var1)

df_D <- subset(df,var2=="D")
mean_D <- mean(df_D$var1)

The big difficulty I m facing here is I don't know in advance how many differents values I have in var2. In my example I have 4 possibilities : "A", "B", "C" and "D". But in the reality, it is random... sometimes I have a dataset with 2 differents values in var2, sometimes 15, sometimes more...

I think a loop could be a good solution but I am a bit lost...

Can you please help? Thanks in advance.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Remi
  • 961
  • 1
  • 13
  • 25
  • 1
    `tapply(df$var1, df$var2, FUN=mean)` https://stackoverflow.com/questions/3505701/grouping-functions-tapply-by-aggregate-and-the-apply-family – jogo Oct 28 '19 at 13:47

1 Answers1

2

The easiest way would be to use the dplyr package

a <- c(1,23,18,47,15,56,67,43,9)
b <- c("A","B","B","C","C","B","D","A","C")
df <- data.frame(var1=a, var2=b)

library(dplyr)
df2 <- df %>% 
  group_by(var2) %>% 
  summarise(mean=mean(var1))
df2

#output
# # A tibble: 4 x 2
# var2   mean
# <fct> <dbl>
# 1 A      22  
# 2 B      32.3
# 3 C      23.7
# 4 D      67

LouisMP
  • 321
  • 1
  • 12
  • Glad to help. If this answered the question to your satisfaction, could mark the answer as accepted (checkmark under the voting feature) to let others know the question has been answered. – LouisMP Oct 28 '19 at 14:09