1

Does anyone have a solution to perform

  • separate operations on
  • groups of consecutive values that are a
  • subset of a time series and are
  • identified by a reoccurring, identical flag
  • with R ?

In the example data set created by the code below, this would refer for example to calculating the mean of “value” separately for each group where “flag” == 1 on consecutive days.

A typical case in science would be a data set recorded by an instrument that repeatedly executes a calibration procedure and flags the corresponding data with the same flag, but the user needs to evaluate each calibration separately with the same procedure.

Thanks for your suggestions. Jens

library(lubridate)

df <- data.frame(
    date = seq(ymd("2018-01-01"), ymd("2018-06-29"), by = "days"),
    flag = rep( c(rep(1,10), rep(0, 20)), 6),
    value = seq(1,180,1)
  )

1 Answers1

1

The data.table function rleid is great for giving group IDs to runs of consecutive values. I continue to use data.table, but you could everything but the rleid part just as well in dplyr or base.

My answer comes down to use data.table::rleid and then pick your favorite way to take the mean by group (R-FAQ link).

library(data.table)
setDT(df)
df[, r_id := rleid(flag)]
df[flag == 1, list(
    min_date = min(date),
    max_date = max(date),
    mean_value = mean(value)
  ), by = r_id]

#    r_id   min_date   max_date mean_value
# 1:    1 2018-01-01 2018-01-10        5.5
# 2:    3 2018-01-31 2018-02-09       35.5
# 3:    5 2018-03-02 2018-03-11       65.5
# 4:    7 2018-04-01 2018-04-10       95.5
# 5:    9 2018-05-01 2018-05-10      125.5
# 6:   11 2018-05-31 2018-06-09      155.5
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294