How to create row/subject index in longitudinal data

Question

I have a longitudinal data formatting like below. I want to create an index for each subject regarding their 'disease' status. If disease status is NA or 0, the index would be 0; if disease status is 1, the index would be 1.

However, I want each row within the same subject to have the same index, disregard at what observational point their status is. Say, as long as the individual has 'disease=1' disregard which row, the index should be 1 for all rows for that individual.

Anybody has good ideas? Thank!

id disease  index
1    NA      0
1    NA      0
1    NA      0
2    NA      1
2     1      1
2     1      1
3    NA      1
3    NA      1
3     1      1
4     1      1
4     0      1
4     0      1
5     0      0
5     0      0
5     0      0

Welcome to SO. FOr some reason your image (which had a screencap of your data?) did not appear. This is probably for the best: all SO questions should include a reproducible example. One way to do this is to include code which generates a fake data set, with the properties of your real data set. Another option is to use `dput` to paste your objects into your question. More here: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Drew Steen, Sep 25 '13 at 19:33
Hello and welcome to SO. To help make a reproducible example, you can use `reproduce()` . Instructions are here: http://bit.ly/SORepro . — Ricardo Saporta, Sep 25 '13 at 19:37

score 2 · Accepted Answer · answered Sep 25 '13 at 19:43

assuming dat is your data

using `data.table`

require(data.table)
DT <- data.table(dat)

DT[, index := as.numeric(sum(disease >= 1, na.rm=TRUE)>0), by=id]

using Base `R`

INDX <- tapply(dat$disease, dat$id, function(x) 
             as.numeric(sum(x >= 1, na.rm=TRUE)>0))

INDX <- data.frame(id=names(INDX), index=INDX)
dat <- merge(dat, INDX)

How to create row/subject index in longitudinal data

1 Answers1

assuming dat is your data

using data.table

using Base R

using `data.table`

using Base `R`