3

For example,

> set.seed(1)
 df1 <- data.frame(ID = c(rep(c(rep(1,3), rep(2,3)),2),rep(c(rep(3,3), rep(4,3)),2)),
                     Day=rep(c(1,2,3),8))
 df2 <- data.frame(measure = c(rep("mean",6),rep("median",6),rep("mean",6),rep("median",6)),
                     val=sample(1:24,24))

 data <- cbind(df1,df2)

> data

    ID Day measure val
1   1   1    mean   7
2   1   2    mean   9
3   1   3    mean  13
4   2   1    mean  20
5   2   2    mean   5
6   2   3    mean  18
7   1   1  median  19
8   1   2  median  12
9   1   3  median  11
10  2   1  median   1
11  2   2  median   3
12  2   3  median  14
13  3   1    mean  23
14  3   2    mean  21
15  3   3    mean   8
16  4   1    mean  16
17  4   2    mean   6
18  4   3    mean  24
19  3   1  median  22
20  3   2  median   4
21  3   3  median  17
22  4   1  median  15
23  4   2  median   2
24  4   3  median  10

I want to create another variable that measures the change from day 1 for each measure within each ID so

    ID Day measure val change
1   1   1    mean   7    0
2   1   2    mean   9    2
3   1   3    mean  13    6
4   2   1    mean  20    0
5   2   2    mean   5  -15
6   2   3    mean  18   -2
7   1   1  median  19    0
8   1   2  median  12   -7
9   1   3  median  11   -8
10  2   1  median   1    0
11  2   2  median   3    2
12  2   3  median  14   13
13  3   1    mean  23    0
14  3   2    mean  21   -2
15  3   3    mean   8   -15
16  4   1    mean  16    0
17  4   2    mean   6   -10
18  4   3    mean  24    8
19  3   1  median  22    0
20  3   2  median   4   -18
21  3   3  median  17   -5
22  4   1  median  15    0
23  4   2  median   2   -13
24  4   3  median  10   -5

I've been trying to modify the code in Calculating change from baseline with data in long format but there are repeated measures in my dataset.

Community
  • 1
  • 1
Dale Wu
  • 199
  • 1
  • 11

1 Answers1

4

We could use data.table to create 'change' column. Convert the 'data.frame' to 'data.table' (setDT(data)), Grouped by 'ID', 'measure', we calculate the difference between 'val' and the 'val' corresponding to 'Day' 1 to create the 'change'.

library(data.table)
setDT(data)[, change:= val-val[Day==1L], by = .(ID, measure)]
data
#    ID Day measure val change
# 1:  1   1    mean   7      0
# 2:  1   2    mean   9      2
# 3:  1   3    mean  13      6
# 4:  2   1    mean  20      0
# 5:  2   2    mean   5    -15
# 6:  2   3    mean  18     -2
# 7:  1   1  median  19      0
# 8:  1   2  median  12     -7
# 9:  1   3  median  11     -8
#10:  2   1  median   1      0
#11:  2   2  median   3      2
#12:  2   3  median  14     13
#13:  3   1    mean  23      0
#14:  3   2    mean  21     -2
#15:  3   3    mean   8    -15
#16:  4   1    mean  16      0
#17:  4   2    mean   6    -10
#18:  4   3    mean  24      8
#19:  3   1  median  22      0
#20:  3   2  median   4    -18
#21:  3   3  median  17     -5
#22:  4   1  median  15      0
#23:  4   2  median   2    -13
#24:  4   3  median  10     -5

A similar option using dplyr would be

library(dplyr)
data %>% 
   group_by(ID, measure) %>%
   mutate(change = val- val[Day==1L])

Or a base R option with ave if the 'Day' column is ordered

 data$change <- with(data, val-ave(val, ID, measure, FUN=function(x) head(x,1)))

Or another base R option without grouping if the columns are ordered

 data$change <- with(data, {i <- Day==1L; val-(val*i)[val*i>0][cumsum(i)] }) 
akrun
  • 874,273
  • 37
  • 540
  • 662