I have a millions row long data.table
with about 20 counter-style columns. Those columns display various storage system operations' counters increasing in time. Sometimes however, the counter gets reset on the monitored system and the single observation value is lower than a previous one.
I need to calculate a opsdiff
column, which would contain an arithmetical difference (expected to be positive) of subsequent values of same type based on column type
. When a counter reset situation is identified - ie. the difference is negative, the actual value of the counter should be used.
> dt <- data.table(type=rep(c("a","b"),each=6),opscounter=c(105609,106316,106705,489,723,1250))
> dt
type opscounter
1: a 105609
2: a 106316
3: a 106705
4: a 489
5: a 723
6: a 1250
7: b 105609
8: b 106316
9: b 106705
10: b 489
11: b 723
12: b 1250
The result I wish to get:
> dt
type opscounter opsdiff
1: a 105609 NA
2: a 106316 707
3: a 106705 389
4: a 489 489
5: a 723 234
6: a 1250 527
7: b 105609 NA
8: b 106316 707
9: b 106705 389
10: b 489 489
11: b 723 234
12: b 1250 527
>
I tried to build a construction based on ifelse()
but I don't know how to address the current row's opscounter
value in else part and neither the dual call of diff()
is perhaps an efficient usage:
dt[,opsdiff:=c(NA, ifelse(diff(opscounter)>0, diff(opscounter), opscounter)), by=type]
How can I correctly calculate the opsdiff
column?