2

Here is a small reproducible example of my data:

> mydata <- structure(list(subject = c(1, 1, 1, 2, 2, 2), time = c(0, 1, 2, 0, 1, 2), measure = c(10, 12, 8, 7, 0, 0)), .Names = c("subject", "time", "measure"), row.names = c(NA, -6L), class = "data.frame")

> mydata

subject  time  measure
1          0      10
1          1      12
1          2       8
2          0       7
2          1       0
2          2       0

I would like to generate a new variable that is the "change from baseline". That is, I would like

subject  time  measure  change
1          0      10      0
1          1      12      2
1          2       8     -2
2          0       7      0
2          1       0     -7
2          2       0     -7

Is there an easy way to do this, other than looping through all the records programatically or reshaping to wide format first ?

LeelaSella
  • 757
  • 3
  • 13
  • 24

3 Answers3

6

There are many possibilities. My favorites:

library(plyr)
ddply(mydata,.(subject),transform,change=measure-measure[1])

  subject time measure change
1       1    0      10      0
2       1    1      12      2
3       1    2       8     -2
4       2    0       7      0
5       2    1       0     -7
6       2    2       0     -7

library(data.table)
myDT <- as.data.table(mydata)
myDT[,change:=measure-measure[1],by=subject]
print(myDT)

   subject time measure change
1:       1    0      10      0
2:       1    1      12      2
3:       1    2       8     -2
4:       2    0       7      0
5:       2    1       0     -7
6:       2    2       0     -7

data.table is preferable if your dataset is large.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • @Arun I didn't make a statement about small datasets, did I? Although for the beginner data.table is kind of hart to grasp and it might be better to stay with conventional data.frames for the time being. – Roland Feb 09 '13 at 13:37
  • @Arun I don't know if it is still the case, but I remember there being a performance advantage with using `print`. In any case it is better syntax. – Roland Feb 09 '13 at 13:39
  • @Arun https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1001&group_id=240&atid=978 – Roland Feb 09 '13 at 13:53
  • @Arun and Roland, But that issue happens for data.frame too, iiuc. But since everyone learns not to type 'DF' on its own, nobody realises it's there. The time difference between 'DT' on it's own and 'print(DT)', on the console, is what that issue it about. – Matt Dowle Feb 10 '13 at 23:21
4

What about:

mydata$change <- do.call("c", with(mydata, lapply(split(measure, subject), function(x) x - x[1])))

alternatively you could also use the ave function:

with(mydata, ave(measure, subject, FUN=function(x) x - x[1]))
# [1]  0  2 -2  0 -7 -7

or

within(mydata, change <- ave(measure, subject, FUN=function(x) x - x[1]))
#   subject time measure change
# 1       1    0      10      0
# 2       1    1      12      2
# 3       1    2       8     -2
# 4       2    0       7      0
# 5       2    1       0     -7
# 6       2    2       0     -7
flodel
  • 87,577
  • 21
  • 185
  • 223
johannes
  • 14,043
  • 5
  • 40
  • 51
1

you can use tapply:

mydata$change<-as.vector(unlist(tapply(mydata$measure,mydata$subject,FUN=function(x){return (x-rep(x[1],length(x)))})));
Aditya Sihag
  • 5,057
  • 4
  • 32
  • 43