Separate operations on groups of time series values identified by same flag in R

Question

Does anyone have a solution to perform

separate operations on
groups of consecutive values that are a
subset of a time series and are
identified by a reoccurring, identical flag
with R ?

In the example data set created by the code below, this would refer for example to calculating the mean of “value” separately for each group where “flag” == 1 on consecutive days.

A typical case in science would be a data set recorded by an instrument that repeatedly executes a calibration procedure and flags the corresponding data with the same flag, but the user needs to evaluate each calibration separately with the same procedure.

Thanks for your suggestions. Jens

library(lubridate)

df <- data.frame(
    date = seq(ymd("2018-01-01"), ymd("2018-06-29"), by = "days"),
    flag = rep( c(rep(1,10), rep(0, 20)), 6),
    value = seq(1,180,1)
  )

score 1 · Accepted Answer · answered Nov 07 '18 at 18:11

The data.table function rleid is great for giving group IDs to runs of consecutive values. I continue to use data.table, but you could everything but the rleid part just as well in dplyr or base.

My answer comes down to use data.table::rleid and then pick your favorite way to take the mean by group (R-FAQ link).

library(data.table)
setDT(df)
df[, r_id := rleid(flag)]
df[flag == 1, list(
    min_date = min(date),
    max_date = max(date),
    mean_value = mean(value)
  ), by = r_id]

#    r_id   min_date   max_date mean_value
# 1:    1 2018-01-01 2018-01-10        5.5
# 2:    3 2018-01-31 2018-02-09       35.5
# 3:    5 2018-03-02 2018-03-11       65.5
# 4:    7 2018-04-01 2018-04-10       95.5
# 5:    9 2018-05-01 2018-05-10      125.5
# 6:   11 2018-05-31 2018-06-09      155.5

That was exactly what I was searching for and perfectly does the job. Thank you so much, @Gregor ! — Jens Daniel Müller, Nov 08 '18 at 13:00

Separate operations on groups of time series values identified by same flag in R

1 Answers1