I have a large dataset of observations, with several observations in rows and several different variables for each ID. e.g.
Data
ID V1 V2 V3 time
1 35 100 5.2 2015-07-03 07:49
2 25 111 6.2 2015-04-01 11:52
3 41 120 NA 2015-04-01 14:17
1 25 NA NA 2015-07-03 07:51
2 NA 122 6.2 2015-04-01 11:50
3 40 110 4.1 2015-04-01 14:25
I would like to extract the earliest (first) observation for each variable independently based on the time column, for each unique ID. i.e. I would like to combine multiple rows of the same ID together so that I have one row of the first observation for each variable (time variable will not be equal for all).
The min()
function will return the earliest time for a set of observations, but the problem is I need to do this for each variable. To do this I have tried using the tapply
function with minimum time
tapply(Data, ID, min(time)
but get an error saying
"Error in match.fun(FUN) : 'min(Data$time)' is not a function, character or symbol.
I suspect that there is also a problem because many of the rows of observations have missing data.
Alternatively I have tried to just do each variable one at a time using aggregate, and select the min(time) this way:
firstV1 <-aggregate(V1[min(time)]~ID, data=Data, na.rm=T)
From the example dataset, what I would like to see is:
Data
ID V1 V2 V3
1 35 100 5.2
2 25 122 6.2
3 41 120 4.1
Note the '25' for ID2
V1
was from the later observation because the first observation was missing. Same for ID3
V3
.
Input data
structure(list(ID = c(1L, 2L, 3L, 1L, 2L, 3L), V1 = c(35L, 25L,
41L, 25L, NA, 40L), V2 = c(100L, 111L, 120L, NA, 122L, 110L),
V3 = c(5.2, 6.2, 4.2, NA, 6.2, 4.1), time = structure(c(1435906140,
1427885520, 1427894220, 1435906260, 1427885400, 1427894700
), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("ID",
"V1", "V2", "V3", "time"), row.names = c(NA, -6L), class = "data.frame")