If I understand the question correctly, the code below should give the desired answer:
library(data.table)
setorder(DT, Client, Stage)
DT[, duration := shift(Stage.Start, type = "lead", fill = Sys.Date()) - Stage.Start,
by = Client][, .(avg.duration = mean(duration)), by = Stage]
Stage avg.duration
1: Stage 1 137.5
2: Stage 2 150.5
3: Stage 3 153.0
4: Stage 4 53.0
Note that I have used the actual date for stages which are not completed yet to avoid NA
and to get a duration to date.
Alternatively,
DT[, duration := shift(Stage.Start, type = "lead") - Stage.Start, by = Client][
, .(avg.duration = mean(duration, na.rm = TRUE)), by = Stage]
will return the expected result (except for slight variations in the figures)
Stage avg.duration
1: Stage 1 33.0
2: Stage 2 150.5
3: Stage 3 NaN
4: Stage 4 NaN
Explanation
OP's intention might become clearer if the data frame is properly ordered by Client
and Stage
:
setorder(DT, Client, Stage)
DT
id Client Stage Stage.Start
1: 1 Client A Stage 1 2017-01-01
2: 4 Client A Stage 2 2017-02-03
3: 5 Client A Stage 3 2017-06-01
4: 2 Client B Stage 1 2017-03-04
5: 3 Client C Stage 2 2017-03-10
6: 6 Client C Stage 4 2017-09-09
Then the durations are calculated for each client, either using the actual date (for a duration to date):
DT[, duration := shift(Stage.Start, type = "lead", fill = Sys.Date()) - Stage.Start,
by = Client][]
id Client Stage Stage.Start duration
1: 1 Client A Stage 1 2017-01-01 33
2: 4 Client A Stage 2 2017-02-03 118
3: 5 Client A Stage 3 2017-06-01 153
4: 2 Client B Stage 1 2017-03-04 242
5: 3 Client C Stage 2 2017-03-10 183
6: 6 Client C Stage 4 2017-09-09 53
or NA
by default:
DT[, duration := shift(Stage.Start, type = "lead") - Stage.Start, by = Client][]
id Client Stage Stage.Start duration
1: 1 Client A Stage 1 2017-01-01 33
2: 4 Client A Stage 2 2017-02-03 118
3: 5 Client A Stage 3 2017-06-01 NA
4: 2 Client B Stage 1 2017-03-04 NA
5: 3 Client C Stage 2 2017-03-10 183
6: 6 Client C Stage 4 2017-09-09 NA
Data
library(data.table)
DT <- fread("id, Client, Stage, Stage.Start
1, Client A, Stage 1, 2017/01/01
2, Client B, Stage 1, 2017/03/04
3, Client C, Stage 2, 2017/03/10
4, Client A, Stage 2, 2017/02/03
5, Client A, Stage 3, 2017/06/01
6, Client C, Stage 4, 2017/09/09")
DT[, Stage.Start := as.IDate(Stage.Start, "%Y/%m/%d")]