How to find the average of specific columns for specific intervals of data in R?

Question

I have a data frame df with more than 500 rows and 11 columns (sample below)

Subject    Group    RA    TA    Trial   VA1    VA2    VA3    VB1    VB2    VB3
1          CAI      0     1     1       6.289  5.924  7.194  4.058  9.770  -3.479
1          CAI      0     1     2       10.512 6.376  8.058  1.905  8.031  -10.698
1          CAI      0     1     3       6.554  5.046  9.482  1.825  7.152  -9.608
1          CAI      0     2     1       8.235  3.006  8.054  2.816  9.911  -4.125
1          CAI      0     2     2       6.744  2.176  9.751  3.566  12.217 -8.225
2          CAI      45    1     1       5.321  5.781  7.821  4.402  6.420  4.125
2          CAI      45    2     1       7.594  4.472  8.053  5.226  5.417  2.481
2          CAI      45    3     1       5.918  5.771  10.154 4.602  5.727  2.092
2          CAI      45    3     2       6.884  5.292  9.180  4.323  4.449  2.191

I am trying to reduce the data set by getting the average of all the trials per condition. In this case, the "Trial" column indicates the trial number for the condition. So for the condition "Subject = 1 , Group = CAI, RA = 0, TA = 1," there are three trials, and I want to get the averages of VA1 to VB3 for the three trials.

However, number of trials are not consistent per condition.

I wanted to see something like this:

Subject    Group    RA    TA    VA1    VA2    VA3    VB1    VB2    VB3
1          CAI      0     1     7.785  5.782  8.244  2.596  8.317  -7.928
1          CAI      0     2     7.490  2.591  8.902  3.191  11.064 -6.175
2          CAI      45    1     5.321  5.781  7.821  4.402  6.420  4.125
2          CAI      45    2     7.594  4.472  8.053  5.226  5.417  2.481
2          CAI      45    3     6.401  5.532  9.667  4.463  5.088  2.142

This is the code I've currently come up with. It seems to work, however, there might be room to improve, and the "Group" column is missing, so I wanted to include it in the output data frame.

attach(df)

factor(Subject)
factor(Group)
factor(RA)
factor(TA)

ndf = c()
SubjList <- list(1,2)
RAList <- list(0, 45, -45)
TAList <- list(1,2,3,4)

for (S in SubjList){
  for (R in RAList){
    for (T in TAList){
      TEMP <- subset(M, (Subject == S & RA == R & TA ==T ))
      mVA1 = mean(TEMP$VA1)
      mVA2 = mean(TEMP$VA2)
      mVA3 = mean(TEMP$VA3)
      mVB1 = mean(TEMP$VB1)
      mVB2 = mean(TEMP$VB2)
      mVB3 = mean(TEMP$VB3)
      
      lst = c(S, R, T, mVA1, mVA2, mVA3, mVB1, mVB2, mVB3)
      ndf <- rbind(ndf, lst) 
    }
  }
}

I would love to know what you think, and thank you in advance for your help!

Try `aggregate(cbind(VA1, VA2, VA3, VB1, VB2, VB3) ~ Subject + Group + RA + TA, M, mean)` — jay.sf, Dec 09 '22 at 07:11

Bensstats · Answer 1 · 2022-12-09T13:13:56.913

Update: Thank you zephyrl for using summarize(across(VA1:VB3, mean))


library(dplyr)
# Your table
df<-read.table(header=TRUE,
           text="
Subject    Group    RA    TA    Trial   VA1    VA2    VA3    VB1    VB2    VB3
1          CAI      0     1     1       6.289  5.924  7.194  4.058  9.770  -3.479
1          CAI      0     1     2       10.512 6.376  8.058  1.905  8.031  -10.698
1          CAI      0     1     3       6.554  5.046  9.482  1.825  7.152  -9.608
1          CAI      0     2     1       8.235  3.006  8.054  2.816  9.911  -4.125
1          CAI      0     2     2       6.744  2.176  9.751  3.566  12.217 -8.225
2          CAI      45    1     1       5.321  5.781  7.821  4.402  6.420  4.125
2          CAI      45    2     1       7.594  4.472  8.053  5.226  5.417  2.481
2          CAI      45    3     1       5.918  5.771  10.154 4.602  5.727  2.092
2          CAI      45    3     2       6.884  5.292  9.180  4.323  4.449  2.191
           ")

df %>% 
  group_by(Subject,Group,RA,TA) %>% 
  summarize(across(VA1:VB3, mean))

# A tibble: 5 × 10
# Groups:   Subject, Group, RA [2]
  Subject Group    RA    TA   VA1   VA2   VA3   VB1   VB2   VB3
    <int> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1       1 CAI       0     1  7.78  5.78  8.24  2.60  8.32 -7.93
2       1 CAI       0     2  7.49  2.59  8.90  3.19 11.1  -6.18
3       2 CAI      45     1  5.32  5.78  7.82  4.40  6.42  4.12
4       2 CAI      45     2  7.59  4.47  8.05  5.23  5.42  2.48
5       2 CAI      45     3  6.40  5.53  9.67  4.46  5.09  2.14

You forgot to group by subject. The result here is correct though since the group and Subject are level equivalent, but might not be the case for the whole dataset. Probably subject 10 is also of group 0 etc. So you still need to group by Subject — Onyambu, Dec 09 '22 at 04:53

How to find the average of specific columns for specific intervals of data in R?

1 Answers1