0

I have a dataset in R, which contains observations by time. For each subject, I have up to 4 rows, and a variable of ID along with a variable of Time and a variable called X, which is numerical (but can also be categorical for the sake of the question). I wish to compute the change from baseline for each row, by ID. Until now, I did this in SAS, and this was my SAS code:

data want;
retain baseline;
set have;
if (first.ID) then baseline = .;
if (first.ID) then baseline = X;
else baseline = baseline;
by ID;
Change = X-baseline;
run;

My question is: How do I do this in R ? Thank you in advance.

Dataset Example (in SAS, I don't know how to do it in R).

data have;
input ID, Time, X;
datalines;
1 1 5
1 2 6
1 3 8
1 4 9
2 1 2
2 2 2
2 3 7
2 4 0
3 1 1
3 2 4
3 3 5
;
run;
Scarabee
  • 5,437
  • 5
  • 29
  • 55
user2899944
  • 249
  • 2
  • 11
  • Please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Roman Luštrik Oct 08 '16 at 09:28

1 Answers1

4

Generate some example data:

dta <- data.frame(id = rep(1:3, each=4), time = rep(1:4, 3), x = rnorm(12))

# > dta
# id time            x
# 1   1    1 -0.232313499
# 2   1    2  1.116983376
# 3   1    3 -0.682125947
# 4   1    4 -0.398029820
# 5   2    1  0.440525082
# 6   2    2  0.952058966
# 7   2    3  0.690180586
# 8   2    4 -0.995872696
# 9   3    1  0.009735667
# 10  3    2  0.556254340
# 11  3    3 -0.064571775
# 12  3    4 -1.003582676

I use the package dplyr for this. This package is not installed by default, so, you'll have to install it first if it isn't already.

The steps are: group the data by id (following operations are done per group), sort the data to make sure it is ordered on time (that the first record is the baseline), then calculate a new column which is the difference between x and the first value of x. The result is stored in a new data.frame, but can of course also be assigned back to dta.

library(dplyr)

dta_new <- dta %>% group_by(id) %>% arrange(id, time) %>% 
  mutate(change = x - first(x))


# > dta_new
# Source: local data frame [12 x 4]
# Groups: id [3]
# 
# id  time            x      change
# <int> <int>        <dbl>       <dbl>
# 1      1     1 -0.232313499  0.00000000
# 2      1     2  1.116983376  1.34929688
# 3      1     3 -0.682125947 -0.44981245
# 4      1     4 -0.398029820 -0.16571632
# 5      2     1  0.440525082  0.00000000
# 6      2     2  0.952058966  0.51153388
# 7      2     3  0.690180586  0.24965550
# 8      2     4 -0.995872696 -1.43639778
# 9      3     1  0.009735667  0.00000000
# 10     3     2  0.556254340  0.54651867
# 11     3     3 -0.064571775 -0.07430744
# 12     3     4 -1.003582676 -1.01331834
Jan van der Laan
  • 8,005
  • 1
  • 20
  • 35