My dataset contains many IDs with at least 100 observations per ID, with one observation per date. There is overlap in dates among IDs. See fake dataset below with 10 IDs:
id = 10
m = 2 * id
a_0 = seq(as.Date("2001-01-01"), as.Date("2001-12-31"), by = "day")
a_1 = matrix(sort(sample(as.character(a_0), m)), nc = 2)
a_2 = list()
for(i in 1:nrow(a_1)){
a_3 = seq(as.Date(a_1[i, 1]), as.Date(a_1[i, 2]), by = "day")
a_4 = data.frame(i, as.character(a_3), round(runif(length(a_3), 1)))
colnames(a_4) = c("id", "date", "value")
a_2[[i]] = a_4
}
DF = dplyr::bind_rows(a_2)
dim(DF)
table(DF[, 1])
For each ID, I would like to randomly sample consecutive observations over a fix number of days, something similar to what has been asked here: Sample n consecutive dates from a random starting date for each index in a data frame. So, something like that (e.g., with 10 consecutive days):
library(dplyr)
df.sample <- arrange(DF, date) %>%
group_by(id) %>%
mutate(date = as.Date(date), start = sample(date, 1)) %>%
filter(date >= start & date <= (start + 9))
However, I need to randomly sample different time periods for each ID: 2 x 10 days and 1 x 25 days. Also, the time periods sampled cannot overlap with each other within an ID , i.e. the same date cannot be sampled twice for the same ID.
On top of that, the first and last observation of each ID should not be sampled. Finally, there should always be at least 1 observation between the time periods sampled.
I struggle to find a simple solution that would include all these constraints. Some help would be much appreciated.