This is an interesting question. If I understand correctly, the OP asks to insert additional rows between each group of contiguous values of Status
if the previuos group had Status == 1
and the current group has Status == 0
. In addtion, I understand that filling up missing dates within a streak of Status == 1
is not requested.
So here are two different data.table
approaches:
1. Grouping and prepending additional rows to each Status == 0
group
This solution borrows from Matt Dowle's answer to Get the last row of a previous group in data.table (see here for another use case).
It creates groups of contiguous streaks of 0/1 values in Status
(using rleid()
). For each group, it is checked whether it is required to insert rows. If so, the additional rows are prepended to the rows of the current group (using rbind()
).
library(data.table)
options(datatable.print.class = TRUE)
dt[, timestamp := as.IDate(timestamp, "%d-%m-%Y")] # coerce character date to numeric
dt[, grp := rleid(Status)] # create groups of consecutive values of Status
dt[, new := ""] # just for test & demonstration
pg <- first(dt) # initialise storage of last row of previous group
dt[, {
if (first(timestamp) - pg$timestamp > 1L & pg$Status == 1L) {
# if there is a gap and Status switches from 1 to 0 the fill the gap
add <- .(timestamp = seq(pg$timestamp + 1L, first(timestamp) - 1L, by = 1L), Status = 1L, new = "*")
} else {
# no gap to fill
add <- .SD[0L]
}
pg <- last(.SD) # remember last row
rbind(add, .SD) # prepend additional rows
}, by = grp][, grp := NULL][] # remove grouping variable
timestamp Status new
<IDat> <int> <char>
1: 2020-01-05 0
2: 2020-01-06 0
3: 2020-01-07 1
4: 2020-01-08 1
5: 2020-01-09 1
6: 2020-01-10 1 *
7: 2020-01-11 0
8: 2020-01-13 1
9: 2020-01-14 0
10: 2020-01-16 1
11: 2020-01-17 1
12: 2020-01-18 1 *
13: 2020-01-19 1 *
14: 2020-01-20 0
Note that an enhanced dataset has been used (see below) to allow for a more thoroughly testing. Also, the column new
has been added just to demonstrate where the rows have been inserted.
2. Identify gaps, create missing rows, append and re-order
This approach is different. It identifies the gaps which are to be filled, creates the missing rows, appends them to the original dataset, and re-orders the rows by timestamp:
library(data.table)
options(datatable.print.class = TRUE)
library(magrittr) # piping used to improve readability
dt[, timestamp := as.IDate(timestamp, "%d-%m-%Y")] # coerce character date to numeric
lapply(
dt[, .I[timestamp - shift(timestamp, fill = first(timestamp)) > 1L & shift(Status) == 1 & Status == 0]],
function(i) dt[, .(timestamp = seq(timestamp[i - 1L] + 1L, timestamp[i] - 1L, by = 1L), Status = 1L)]
) %>%
c(list(dt)) %>%
rbindlist() %>%
.[order(timestamp)]
timestamp Status
<IDat> <int>
1: 2020-01-05 0
2: 2020-01-06 0
3: 2020-01-07 1
4: 2020-01-08 1
5: 2020-01-09 1
6: 2020-01-10 1
7: 2020-01-11 0
8: 2020-01-13 1
9: 2020-01-14 0
10: 2020-01-16 1
11: 2020-01-17 1
12: 2020-01-18 1
13: 2020-01-19 1
14: 2020-01-20 0
The expression
dt[, .I[timestamp - shift(timestamp, fill = first(timestamp)) > 1L & shift(Status) == 1 & Status == 0]]
identifies the gaps to be filled by returning the indices in the original dataset dt
where the additional rows need to be inserted before.
[1] 6 11
So, the additional rows need to be inserted between rows 5 to 6, and 10 to 11, resp.
3. Data
The dataset has been expanded to allow for a more thoroughly testing.
dt <- fread(
"timestamp Status
05-01-2020 0
06-01-2020 0
07-01-2020 1
08-01-2020 1
09-01-2020 1
11-01-2020 0
13-01-2020 1
14-01-2020 0
16-01-2020 1
17-01-2020 1
20-01-2020 0")
Note that all solutions posted so far assume that dt
is ordered by increasing timestamp
. If not, it can be achieved by
setorder(dt, timestamp)