I have a million+ row tibble of time series with a number of different IDs and tens of thousands of datapoints per ID.
timeseries <- tibble(ID = c(101, 101, 101, 101, 101),
time = c(1,2,3,4,5),
block = c(0,0,0,0,0))
I have another tibble of some thousands of rows that contains start and end times of events, different for each ID, that should be marked on the time series (so that they can be easily summarized with summarize()
). There are empty timepoints between events, and at the start and beginning of each ID's timeseries.
blocks <- tibble(ID = c(101, 101),
block = c(1, 2),
st = c(1, 4),
end = c(2,5))
How to do this most easily and quickly?
My current solution is horribly slow and clunky:
j <- 1
for(i in 1:nrow(blocks)){
checkrow <- blocks[i,]
while(timeseries[j, "ID"] < checkrow["ID"]) j = j+1 # skip wrong ID
while(timeseries[j, "time"] < checkrow["st"]) j = j+1 # skip timepoints until start
while(timeseries[j, "time"] < checkrow["end"]){
timeseries[j, "block"] <- checkrow["block"] # mark timepoints until end
j = j+1
}
next # move to next block
}
I don't have the start and end points in the time series with NAs between and don't know how to do that, so this and this solution doesn't help.
I'd like to stay within tidyverse
and vector logic instead of loops but don't know how. I looked at map()
but couldn't figure out how to do this. I'm sure I'm missing some simple answer.
edit: So, I made a better version when I just wasn't so tired. Using the base r operations instead of while loops was much, much faster.
First pivoted the blocks
to long format, then made an empty timeseries_complete
, and then:
for(j in (blocks %>% select(ID) %>% unique() %>% pull)){
ts <- timeseries %>%
filter(ID == j) %>%
mutate(trig = NA_integer_)
for(i in 1:(nrow( blocks %>% filter(ID == j) )-1)){
ts[ts$time > blocks[[i, "Time"]] & ts$time < blocks[[i+1, "Time"]], "bl_nr"] <- ts[[i, "block_nr"]]
}
timeseries_complete <- timeseries_complete %>% add_case(ts)
}
This effectively solved the practical problem, but I'd still like to know a tidyverse
version.