After reading quite a few posts about this and trying the solutions for my data I still do not get the desired result. Basically I have a data.frame
with two time columns: TimeStamp
and StartTime
and one column that acts as an identifier for the event: ID
. I want to create a third column where if ID == -999
(representing non-event) then TimeStamp
column is used otherwise StartTime
should be used.
To make a bit more complicated: so, the data represents 'blocks' of events. Between the events there is time where 'nothing' happens i.e. ID == -999
. In that case I want the column to be filled by the first element of that block.
Here is oversimplified version of my data:
ID <- rep(c(84,-999,88),c(3,3,4))
f <- as.POSIXct("09:55:45", format = "%H:%M:%S")
t <- as.POSIXct("10:05:45", format = "%H:%M:%S")
TimeStamp <- seq.POSIXt(from = f, to = t, length.out = 10)
StartTime <- as.POSIXct(rep(c("09:54:12",NA,"10:02:25"),c(3,3,4)), format = "%H:%M:%S")
df <- data.frame(TimeStamp,StartTime,ID)
This is the result I would like to have:
> df[,"Time"] <- rep(c("09:54","09:59","10:02"), c(3,3,4))
> df
# TimeStamp StartTime ID Time
#1 2015-04-13 09:55:45 2015-04-13 09:54:12 84 09:54
#2 2015-04-13 09:56:51 2015-04-13 09:54:12 84 09:54
#3 2015-04-13 09:57:58 2015-04-13 09:54:12 84 09:54
#4 2015-04-13 09:59:05 <NA> -999 09:59
#5 2015-04-13 10:00:11 <NA> -999 09:59
#6 2015-04-13 10:01:18 <NA> -999 09:59
#7 2015-04-13 10:02:25 2015-04-13 10:02:25 88 10:02
#8 2015-04-13 10:03:31 2015-04-13 10:02:25 88 10:02
#9 2015-04-13 10:04:38 2015-04-13 10:02:25 88 10:02
#10 2015-04-13 10:05:45 2015-04-13 10:02:25 88 10:02
I have tried things like ifelse
and sapply
. Its not really working. The solution I have up till now is by extracting all the non-events (ID == -999
). Then with another ID variable that identifies all the unique events/non-events aggregate
to find the first TimeStamp
with min
function. Then I have two different Time
columns which I join using the solution in this post.
It works, but I think there is a more elegant and direct way to do this. But being a R-newbie I can't solve it quite yet.
Any suggestions?
BTW: I hope its a bit clear, please tell me if its not.
EDIT: I don't think my question is a duplicate, because those answers did not work in my case. It doesn't solve the added complication I tried to explain above.