Average the periods 1-3 and 4-6 for each treatment and id for each variable and get the data in a new data frame

Question

Here I would like to average the periods 1-3 and 4-6 for each treatment and id for each variable and get the data in a new data frame. Does anyone know how I accomplish this?

set.seed(1)
id <- rep(1:2,each=6)
trt <- c("A","A","A","B", "B", "B","A","A","A","B", "B", "B")
period <- rep(1:6,2)
pointA <- sample(1:10,12, replace=TRUE)
pointB<- sample(1:10,12, replace=TRUE)
pointC<- sample(1:10,12, replace=TRUE)
df <- data.frame(id,trt,period,pointA, pointB,pointC)
head(df)

   id trt period pointA pointB pointC
1   1   A      1      3      7      3
2   1   A      2      4      4      4
3   1   A      3      6      8      1
4   1   B      4     10      5      4
5   1   B      5      3      8      9
6   1   B      6      9     10      4
7   2   A      1     10      4      5
8   2   A      2      7      8      6
9   2   A      3      7     10      5
10  2   B      4      1      3      2
11  2   B      5      3      7      9
12  2   B      6      2      2      7

I would like it to look like this:

  id trt Period pointA pointB pointC
1  1   A    123     13     19      8
2  1   B    456     21     23     17
3  2   A    456     24     22     16
4  2   B    123      6     12     18

Just FYI, you're saying average but showing sums. The calculation part should be covered [here](https://stackoverflow.com/q/9723208/5325862), and the string pasting [here](https://stackoverflow.com/q/15933958/5325862). Both are tasks covered pretty extensively already — camille, Feb 28 '20 at 16:32

score 1 · Accepted Answer · answered Feb 28 '20 at 16:30

1

Using dplyr you can create a new variable with the proper group, and then use that as a group_by. For example

library(dplyr)
df %>% 
  mutate(period_class = case_when(
    period %in% c(1,2,3)~"123",
    period %in% c(4,5,6)~"456")
  ) %>% 
  select(-period) %>% 
  group_by(id, trt, period_class) %>% 
  summarize_all(mean) # though you seem to have used `sum` in your example

answered Feb 28 '20 at 16:30

MrFlick

195,160
17
277
295

Why not just collapse the period strings so you aren't doing it manually? – camille Feb 28 '20 at 16:34
@camille How would you know which values to collapse? You would need some sort of grouping already. I'm curious what you have in mind. Maybe i'm missing something obvious – MrFlick Feb 28 '20 at 16:35
Oh, I see. In this example A only has 123 and B only has 456. The "desired ouput" doesn't match up to the actual data in this case. Yeah, of A or B had non-overlapping periods, then this would be much easier to collapse. I see now. – MrFlick Feb 28 '20 at 16:39
At least within the sample, treatment A is all 1, 2, or 3 and treatment B is all 4, 5, or 6. `df %>% group_by(id, trt) %>% mutate(period = paste(period, collapse = "")) ` Or if in the actual data that might vary by ID as well, include that as a grouping column – camille Feb 28 '20 at 16:39
Oh yeah, hadn't noticed the mismatch between actual and desired. Either way you should be good creating the period grouping based on the period column – camille Feb 28 '20 at 16:40
But if A and B did have values for all 1-6 then the collapsing wouldn’t work. It would collapse them all into 123456. – MrFlick Feb 28 '20 at 16:41

score 1 · Answer 2 · answered Feb 28 '20 at 16:41

Using data.table. I have attached two solutions, one for your example (sum) and one for your request (mean). Unsure why they differ in your question.

Code

library(data.table); setDT(df)

point_var = colnames(df) %like% 'point'

# (i) for the sum (as per your example):
dtsum = df[, lapply(.SD, sum), .SDcols = point_var, .(id, trt, pCat = ifelse(period > 3, 456, 123))] 


# (ii) for the mean (as per your request)
dtmean = df[, lapply(.SD, mean), .SDcols = point_var, .(id, trt, pCat = ifelse(period > 3, 456, 123))]

Output (i) Sum

> dtsum
   id trt pCat pointA pointB pointC
1:  1   A  123     13     19      8
2:  1   B  456     22     23     17
3:  2   A  123     24     22     16
4:  2   B  456      6     12     18

Output (ii) Mean

> dtmean
   id trt pCat   pointA   pointB   pointC
1:  1   A  123 4.333333 6.333333 2.666667
2:  1   B  456 7.333333 7.666667 5.666667
3:  2   A  123 8.000000 7.333333 5.333333
4:  2   B  456 2.000000 4.000000 6.000000

score 1 · Answer 3 · answered Feb 28 '20 at 16:55

You could do this concise in base R using aggregate and ave.

pstClps <- function(x) paste(x, collapse="")  # pre-define FUN

aggregate(. ~ id + trt + period, transform(df, period=ave(period, id, trt, FUN=pstClps)), sum)
#   id trt period pointA pointB pointC
# 1  1   A    123     13     19      8
# 2  2   A    123     24     22     16
# 3  1   B    456     22     23     17
# 4  2   B    456      6     12     18

Data:

df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L), trt = c("A", "A", "A", "B", "B", "B", "A", "A", "A", 
"B", "B", "B"), period = c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 
4L, 5L, 6L), pointA = c(3L, 4L, 6L, 10L, 3L, 9L, 10L, 7L, 7L, 
1L, 3L, 2L), pointB = c(7L, 4L, 8L, 5L, 8L, 10L, 4L, 8L, 10L, 
3L, 7L, 2L), pointC = c(3L, 4L, 1L, 4L, 9L, 4L, 5L, 6L, 5L, 2L, 
9L, 7L)), row.names = c(NA, -12L), class = "data.frame")

Average the periods 1-3 and 4-6 for each treatment and id for each variable and get the data in a new data frame

3 Answers3