Make operations on a subset of a dataset

Question

Consider the following dataset:

a <- c(1,23,18,47,15,56,67,43,9)
b <- c("A","B","B","C","C","B","D","A","C")
df <- data.frame(var1=a, var2=b)

I need to run function (for example mean()) on sub parts of df (based on var2 value), like this:

df_A <- subset(df,var2=="A")
mean_A <- mean(df_A$var1)

df_B <- subset(df,var2=="B")
mean_B <- mean(df_B$var1)

df_C <- subset(df,var2=="C")
mean_C <- mean(df_C$var1)

df_D <- subset(df,var2=="D")
mean_D <- mean(df_D$var1)

The big difficulty I m facing here is I don't know in advance how many differents values I have in var2. In my example I have 4 possibilities : "A", "B", "C" and "D". But in the reality, it is random... sometimes I have a dataset with 2 differents values in var2, sometimes 15, sometimes more...

I think a loop could be a good solution but I am a bit lost...

Can you please help? Thanks in advance.

`tapply(df$var1, df$var2, FUN=mean)` https://stackoverflow.com/questions/3505701/grouping-functions-tapply-by-aggregate-and-the-apply-family — jogo, Oct 28 '19 at 13:47

score 2 · Accepted Answer · answered Oct 28 '19 at 13:52

2

The easiest way would be to use the dplyr package

a <- c(1,23,18,47,15,56,67,43,9)
b <- c("A","B","B","C","C","B","D","A","C")
df <- data.frame(var1=a, var2=b)

library(dplyr)
df2 <- df %>% 
  group_by(var2) %>% 
  summarise(mean=mean(var1))
df2

#output
# # A tibble: 4 x 2
# var2   mean
# <fct> <dbl>
# 1 A      22  
# 2 B      32.3
# 3 C      23.7
# 4 D      67

answered Oct 28 '19 at 13:52

LouisMP

321
1
12

Glad to help. If this answered the question to your satisfaction, could mark the answer as accepted (checkmark under the voting feature) to let others know the question has been answered. – LouisMP Oct 28 '19 at 14:09

Make operations on a subset of a dataset

1 Answers1