I have found similar problem like this, but not exactly the same. Here is my problem, I have the following data set:
> ds
id begin end
1 1 2017-01-15 2017-01-17
2 1 2017-01-01 2017-01-03
3 2 2017-02-01 2017-02-28
4 4 2017-04-11 2017-05-11
5 3 2017-02-05 2017-02-10
6 4 2017-03-10 2017-03-20
7 1 2017-01-30 2017-02-03
8 3 2017-02-28 2017-03-09
9 4 2017-02-26 2017-03-05
I want to create the following column: check
that verifies the following condition for each rows with the same id
value:
ds[i,]$begin - ds[i-1,]$end < 30 => 1 # for each row i
otherwise is 0
. When it is the first element of the group, then there is no previous information, so in such case the value of the new column check
will be always zero too.
The difference with other questions (solved via: ave
, dplyr
) I have seen, is that I need to make a calculation that involves subsequent row but also not the same column.
Here is the code:
Defining the data set
id <- c("1", "1", "2", "4", "3", "4", "1", "3")
begin <- c("20170115", "20170101", "20170201",
"20170411",
"20170205", "20170310",
"20170130", "20170228"
)
end <- c("20170117", "20170103", "20170228",
"20170511",
"20170210", "20170320",
"20170203", "20170309"
)
ds <- data.frame(id = id, begin = as.Date(begin, "%Y%m%d"), end = as.Date(end, "%Y%m%d"))
Sorting the information (we need it for the current solution using a for-loop)
idx = order(rank(ds$id), ds$begin, decreasing = FALSE)
ds <- ds[idx,]
Now using the for-loop for assigning the control variable: check
:
ds$check <- numeric(nrow(ds))
ds$check <- NA_integer_
nrep <- -1
for (i in 1:nrow(ds)) {
rowi <- ds[i,]
if (nrep == -1) {# Setting the first element of ds
end.prev <- rowi$end
id.prev <- rowi$id
ds[i,]$check <- 0
nrep = 1
} else {
id.current <- rowi$id
if(id.prev == id.current) {
ds[i,]$check <- ifelse(rowi$begin - end.prev < 30, 1, 0)
} else {
ds[i,]$check <- 0
}
end.prev <- rowi$end
id.prev <- id.current
}
}
Finally the expected output:
> ds
id begin end check
2 1 2017-01-01 2017-01-03 0
1 1 2017-01-15 2017-01-17 1
7 1 2017-01-30 2017-02-03 1
3 2 2017-02-01 2017-02-28 0
5 3 2017-02-05 2017-02-10 0
8 3 2017-02-28 2017-03-09 1
6 4 2017-03-10 2017-03-20 0
4 4 2017-04-11 2017-05-11 1
>
Thanks for any hint.